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To the beautiful interplay 
of pure and applied mathematics 


Preface 


The theories of quadratic forms and their applications appear in many 
parts of mathematics and the sciences. All students of mathematics have 
the opportunity to encounter such concepts and applications in their first 
course in linear algebra. This subject and its extensions to infinite dimen- 
sions comprise the theory of the numerical range W(T). There are two 
competing names for W(T), namely, the numerical range of T and the 
field of values for T. The former has been favored historically by the func- 
tional analysis community, the latter by the matrix analysis community. It 
is a toss-up to decide which is preferable, and we have finally chosen the 
former because it is our habit, it is a more efficient expression, and because 
in recent conferences dedicated to W(T), even the linear algebra commu- 
nity has adopted it. Also, one universally refers to the numerical radius, 
and not to the field of values radius. Originally, Toeplitz and Hausdorff 
called it the Wertvorrat of a bilinear form, so other good names would be 
value field or form values. The Russian community has referred to it as the 
Hausdorff domain. Murnaghan in his early paper first called it the region 
of the complex plane covered by those values for an n x n matrix T, then 
the range of values of a Hermitian matrix, then the field of values when he 
analyzed what he called the sought-for region. Marshall Stone (1932), in 
his influential book on operator theory, chose to use the name numerical 
range for W(T). 

We know of no book dedicated to presenting the fundamentals of this 
subject. Our goal here is to do so. Our hope is that this interesting and 
useful subject will thereby become available to a wider audience. To that 
end, we have chosen what we call the roadmap approach in writing this 
book. We want it to be quickly informative as to principal cities and main 
routes, but without getting the reader lost in secondary byways or overly 
general description. For this reason, we place and keep the subject squarely 
in a complex Hilbert space. This setting is the heart of the numerical range 
theory for bounded linear operators T and naturally contains the field of 
values theory for finite-dimensional matrices T. 

The outline of the book is as follows. In Chapter 1, we have selected the 
most fundamental properties of the numerical range W(T). These include 
its convexity and its inclusion of the spectrum of T within its closure. In 
Chapter 2, we present mapping theorems relating W(T) to the spectral 
properties of T. The best known of these are probably the power inequal- 
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ity and the dilation theory, but other important mapping properties are 
given there as well. Chapter 3 describes an operator trigonometry in which 
we ourselves have played a major role. This includes sharp criteria for 
accretive operator products Re W (TıTz) 2 0, and a theory of antieigen- 
values of an arbitrary operator T. We believe the latter will eventually 
become a standard chapter in linear algebra and will be useful in a variety 
of applications. Chapter 4 investigates connections between the numerical 
range W(T) and numerical analysis. This includes applications to schemes 
used in computational fluid dynamics and an improved convergence the- 
ory for certain numerical iterative algorithms from optimization theory. In 
Chapter 5, we expose important properties of W(T) for matrix theory, the 
finite-dimensional case. We also develop the essentials of some of the vari- 
ations of W(T), such as the C numerical range, which have been of recent 
interest. We conclude in Chapter 6 with a presentation of the properties of 
certain interesting classes of operators that although no longer symmetric 
or normal still enjoy some important properties of those operators. These 
operator classes are defined in terms of their numerical range W(T) prop- 
erties and include the normaloid, convexoid, and spectraloid operators. 


A Word on Notation 


We use T, A, B,... to denote a linear operator or matrix, usually bounded 
and everywhere defined on a finite- or infinite-dimensional Hilbert space 
C”, H, X,.... Although the letter T predominates for linear operator, we 
use whichever letter seems convenient depending on the context or to de- 
lineate it from another operator. Recall that T is bounded if the set ||Tx\|, 
where z is in the unit sphere of H (i.e., ||x|| = 1), is bounded. In Hilbert 
space, this is the same as saying that T is continuous. The least upper 
bound of such ||Tz|| is called the bound M or operator norm ||T'|| of T. 
In similar fashion, we are not overly worried about forcing a single no- 
tation onto the entities that occur in the numerical range literature, which 
cuts across several mathematical disciplines, most notably functional anal- 
ysis and linear algebra. Thus, in the literature, the numerical radius may 
appear as |W(T)|,w(T), or M(T), depending on the context, and the nu- 
merical lower bound may appear as |w(T)|, when compared to the upper 
bound |W(T)|, or as m(T), or as just m, when compared in the selfad- 
joint case to the upper bound M. Convexity occurs throughout numerical 
range research, and one finds notations such as co, Conv, or conv hull for 
the convex hull of a set. Usually, if it is a toss-up, we have followed the 
operator theory notation in preference to the matrix theory notations or 
those notations appearing in inequality theory. For the convenience of the 
reader, we have included a brief glossary of symbols we frequently use. 
We have chosen the bracket notation (y, x) for the inner product in the 
Hilbert space, which has the advantage of no confusion with parenthe- 
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ses in ordinary mathematical or vector expressions. Recall that the inner 
product is conjugate dual, (y,z) = (x,y), and consistent with the norm, 
(x,x) = ||æ||?. We do not hesitate to identify a linear functional y* with 
its representing inner product, i.e., y*x = (x,y) according to the Riesz- 
Fischer Theorem. In other words, we accept the usual custom of regarding 
a Hilbert space as selfdual. 

A word about the adjoint operator T*. This is the dual operator, which 
occurs in some functional analysis contexts as T’ or in matrix theory con- 
texts as the conjugate transpose TT. Recall that T* is defined by the 
duality relationship 


(Tx, y) = (x, T*y) 


for all x and y in H. The adjoint operator T* has nothing to do with the 
adjoint matrix Adj(A), which occurs as the transpose of the cofactor matrix 
in an expression for the inverse A~! which occurs in most elementary linear 
algebra books. 

We say that T is selfadjoint when T = T*, which presupposes of course 
the usual identification of the space H with its dual H*. This is the same as 
saying that T is symmetric, if one stays with everywhere defined bounded 
operators T. In physics, the terminology T is Hermitian is sometimes 
preferred. In matrix theory, the symmetric matrices A = A? must have 
real entries a;;, whereas the larger class of Hermitian matrices A = A? 
comprises the set of selfadjoint operators. As is well known, selfadjoint 
operators T have the nice property that all eigenvalues are real. As will be 
immediately seen, they are exactly those operators whose numerical range 
W (T) is real. 


Prerequisites and References 


As prerequisites to this book, the preceding discussion on notation indi- 
cates that the reader should have some background experience with the 
theory of linear operators on a Hilbert space, functional analysis, or linear 
algebra. Even a first linear algebra course is sufficient to begin the study 
of the numerical range W(T). We assume that the operator T or matrix 
T and basis representing it are known to us before we start looking at the 
properties of T’s numerical range W(T). In general, we shall assume that 
all readers, beginning or expert, are willing to consult their beginning lin- 
ear algebra or functional analysis textbooks to refresh their memory where 
needed. There are so many good treatments of those subjects that we 
would offend all by listing any. 

However, for both efficiency and honesty, we would like to mention some 
references here that provide particularly appropriate background for this 
book, so that we may refer to them. These include the books that, to our 
knowledge, present to date the most information on the numerical range 
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W(T). For functional analysis, one has the classic by F. Riesz and B. Sz. 
Nagy (1955), Functional Analysis, for the theory of linear operators on a 
Hilbert space developed in a natural way; and for Hilbert space operator 
theory, P. Halmos (1982), A Hilbert Space Problem Book, 2nd ed. Both 
of these books include a discussion of the numerical range W (T), and the 
latter has a full chapter on it. For linear algebra, we mention R. A. Horne 
and C. R. Johnson (1985), Matriz Analysis, and its sequel, Topics in Ma- 
triz Analysis (1991). The latter has a full chapter on the numerical range. 
For numerical linear algebra, we will refer to G. H. Golub and C. F. Van 
Loan (1989), Matrix Computations, which has recently become a standard 
treatise. Then there is the interesting recent book, M. Marcus (1993), 
Matrices and Matlab. Not only is the presentation elementary and easy 
to read, but the author, who has long been interested in W(T), devotes 
considerable attention to it. For Banach space and Banach algebra gen- 
eralizations of the numerical range, there are the monographs F. Bonsall 
and J. Duncan (1971, 1973), Numerical Ranges of Operators on Normed 
Spaces and Elements of Normed Algebras and Numerical Ranges II. These 
monographs are motivated by the application of numerical range concepts 
to the study of Banach algebras. These background sources will be referred 
to as follows: 


[RN] Riesz and Sz. Nagy, [H] Halmos, [HJ1] Horne and Johnson I, 
[HJ2] Horne and Johnson II, [GL] Golub and Van Loan, [M] Marcus, 
[BD] Bonsall and Duncan I, II. 


Finally, we appreciate the forebearance of our families during the prepa- 
ration of this book; the diligent word-processing of the manuscript by Eliz- 
abeth Stimmel at UC-—Boulder; the assistance of Dr. Guido Sartoris in 
writing code for the W (T) graphics; the hospitality of the organizers, Pro- 
fessors T. Ando and C. K. Li, and Professor Natalia Bebiano, respectively, 
at the First and Second Conferences on The Numerical Range and Numer- 
ical Radius at the College of William and Mary in Williamsburg, VA, and 
Universidade de Coimbra, Portugal, in 1992 and 1994, and the Departa- 
mento de Matematicas of the Universidad del Valle, Cali, Colombia, and 
the Department of Mathematics of the University of Colorado, Boulder, 
CO, for enabling both of us to visit the other’s host institution during this 
writing. 

Karl E. Gustafson 


Duggirala K. M. Rao 
August, 1995 
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the complex numbers 

the real numbers 

the numerical range of T 
the numerical radius of T 


the boundary of the numerical range 
the closure of the numerical range 


the spectrum of T 
the point spectrum of T 


the approximate point spectrum 


the spectral radius 

the resolvent of T 

the norm of T 

the range of T 

the nullspace of T 

the convex hull of S 
the interior of S 

the linear span of M 
direct sum of M and N 
direct sum of A and B 
a dilation of T 

lower numerical bound of T 
upper numerical bound of T 
the angle of T 

the cosine of T 

the sine of T 

the exponential of T 
sesquilinear form 
antieigenvalue of T 
quadratic error norm 
gradient of f 

condition number of T 
time step 

grid size 

diffusion matrix 
convection matrix 
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pseudo-spectrum of T 
trace of A 

n x n matrices in C” 
Hadamard product 
Kronecker product 

spatial numerical range 
C-numerical range 
C-numerical radius 
c-numerical range 

x majorized by y 
k-numerical range 
algebraic numerical range 
restricted numerical range 
M-numerical range 
-numerical range 
symmetric numerical range 
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Numerical Range 


Introduction 


Quadratic forms and their use in linear algebra are quite well known. A 
natural extension of these ideas in finite- and infinite-dimensional spaces 
leads us to the numerical range. 

We deal with bounded linear operators on a complex, separable Hilbert 
space H with inner product ( , ). Many of these results remain true in 
the case of real Hilbert space and for nonseparable spaces, but generally 
we leave it to the reader to check those cases. When specialized to finite- 
dimensional spaces, the numerical range is often called the field of values. 

The numerical range of an operator T is the subset of the complex num- 
bers C, given by 


W(T) = {(Tz,2), x € H, |\z|| = 1}. 
The following properties of W (T) are immediate: 
W(al+6T)=a+f6W(T) for a,BeEC, 
W(T*) = {\,rA € W(T)}, 
W(U*TU) =W(T), for any unitary U. 
Example 1. In C? let T be the operator defined by the matrix 
r=(9 2). 
If x = (f,9), læ? =| Ff]? + |g|? = 1, we have Tx = (g,0) and (Tz, x) = gf. 
Notice that |(Tz,x)| = |g|lf| S$ 3 (IfI? + lgl?) £ §, and thus 


wir) {esl ssh. 


Letting z = re? Ors 5 if we choose x = (cos a, et? sina), where 
sin 2a = 2r < 1 and 0 Sa & &, we see that 
(Tx, 2) = e” sinacosa = re”. 
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Thus W(T) = {z : |z| < 4}, the full half-disk. 

Example 2. Let T be the unilateral shift on H, the Hilbert space £2 of 
square summable sequences. For any f = (fi, fo,...) € H, ||f|| = 1, we 
have Tf = (fo, f3,...) and hence consider 


(Tf, f) =fifot fafs + fafat--- 
with | f1|? + |fo|? +---=1. Notice that 


KTF PIS |fallfal + |fellfs| + 
< 5 (fil? + 2l fol? + 2lfsl? +--+] < ; [2 — fal?) 


Thus |(Tf,f}| < 1 if |fı| # 0. For |fı| = 0 and f containing a finite 
number of nonzero entries, we can show in the same way that |(Tf, f)| < 1 
by considering the minimum natural number n for which fn £ 0. 

Thus W(T) is contained in the open unit disk {z : |z| < 1}. We now 
show that it is in fact the open unit disk. Let z = re”, 0 < r < 1, be any 
point of this disk. Consider 


f= (V1 -r2 ,ry1-r2 e7? 24/1 — r2 e7? :), 
Observe that 
IFI? = 1-r?+r?(1-r?)+rt(1 -r?)+ -= 1. 
Furthermore, 
(Tf, f) =r(1-— r?je® +r?(1-— rye 4... 
= re”. 

The following example shows the calculation of the numerical range as 
the envelope of a family of circles. 

Example 3. Let the transformation A : C? — C? be represented by 


A= p ar reR, beC. 


Let (f,g) bea unit vector in C?, f = e’* cos 0, g = et? sin 0, a € [0, z] , BE 
[0, 27). Then we have 


Af = (re’* cos 6 + bef sin 0, —re’® sin 0) 
and 
(Af, f) = r(cos? 6 — sin? 6) + be*?-® sin 6 cos = x + iy, 


b 
x = r cos 20 + [bl sin 20 cos(G — a + 7), 


2 
|b] 


5 sin(G—a+y)sin20, y=argb. 


Y= 
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So 
b 2 
(x — r cos 20)? + y? = PE sin? 20. 


This is a family of circles and we can now obtain their union. 
Rewriting this last expression as 


2 
(x —rcos¢)* +y? = a sin?¢d, O0<¢<rT, 


and differentiating w.r.t. @, we get 
|b|? 
(x —rcosġ)r = 7 s Q. 


Eliminating ¢ between the last two equations, one obtains 


r2 2 


ee a 
24b * (2/4) ~ 


This is an ellipse with center at 0, minor axis b, and major axis v 4r? + b?. 

The most fundamental property of the numerical range is its convexity. 
Other important properties are that its closure contains the spectrum of 
the operator and that the numerical radius provides a norm equivalent to 
the operator norm. These and other basic properties of the numerical range 
and its boundary are established in this chapter. 


1.1 Elliptic Range 


Lemma 1.1-1 (Ellipse Lemma). Let T be an operator on a two-dimensional 
space. Then W(T) is an ellipse whose foci are the eigenvalues of T. 


Proof. Without loss of generality (see the notes at the end of the section), 
we can choose T as an upper triangular matrix 


(1.1-1) T = oa ME 


where A; and Ag are the eigenvalues of T. 
If Ay = Ag = À, we have 


po aAs o Ai wir-ay= dzs Oh, 


(see Example 1) and W(T) is a circle with center at À and radius lal, 
If A; Æ Ag and a = 0, we have 


[a 0 
T=/9 a] 
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If x = (f,g), (Tx,x) = Ax|f|? + Ag|g|? = tà + (1 — t)à2, where t = |f]? 
and |f|? + |g|? = 1. So W(T) is the set of convex combinations of A; and 
Az and is the segment joining them. 

If A; Æ Ae and a Æ 0, we have 


pi Arte — i red 
(1.1-2) i oo 
~ -i Ar +À r ae 
i0 _ ^l 2 
D a S 8 
where aya = re. W(B) is an ellipse with center at (0,0), and minor 


axis |a| (see Example 3), and foci at (r,0) and (—r,0). Thus W(T) is an 
ellipse with foci at A,,A2, and the major axis has an inclination of 0 with 
the real axis. O 


Theorem 1.1-2 (Toeplitz—-Hausdorff). The numerical range of an opera- 
tor is convex. 


Proof. Let a,8 € W(T), a = (Tf, f), and 8 = (Tg, 9), with ||fl| = |Igl| = 
1. We have to show that the segment containing œ and 8 is contained in 
W(T). Let V be the subspace spanned by f and g and E the orthogonal 
projection of H on V, so that Ef = f and Eg = g. We also have, for the 
operator ETE on V, 


(ETEF, f) = (TF, f) 


and 


(ETEg, 9) = (Tg, 9). 


By the ellipse lemma, W (ET E) is an ellipse. Hence W (ET E) contains the 
segment joining a and £. It is easy to see that W (ETE) C W(T) and that 
W (T) contains the segment joining a and 8. O 

From Lemma 1.1-1 and Theorem 1.1-2,we see that we may regard the 
numerical range W (T) as the union of all of its two-dimensional numerical 
ranges, which are ellipses. These ellipses may be degenerate, as is the case, 
for example, in the important instance when T is selfadjoint. Then W (T) is 
just an interval in the real line R. Thus, we see that W (T) need not possess 
an interior. The ellipse degeneracies also permit straight line polygonal 
boundaries of W (T). Generally, W (T) is neither open nor closed, except 
in the finite-dimensional case. Then, it is the jointly continuous image of 
two compact sets, it is compact. 


Notes and References for Section 1.1 


The Schur decomposition theorem (see [HJ1, GL, or M]), guarantees that 
any square matrix T may be transformed by unitary similarity transforma- 
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tion to upper triangular form, with its eigenvalues on the diagonal. Also, 
since W (T) is invariant under unitary transformations, it suffices to con- 
sider only upper triangular matrices in Lemma 1.1-1. Although obvious, 
this unitary invariance of W(T), as for the spectrum a(T), is essential to 
many conclusions. 

Another proof of Lemma 1.1-1 can be found in 
W. F. Donoghue (1957), “On the Numerical Range of a Bounded Opera- 
tor,” Michigan Math. J. 4, 261-263. 
An earlier proof was given in the case of finite dimensions in 
F. D. Murnaghan (1932). “On the Field of Values of a Square Matrix,” 
Proc. Nat. Acad. Sci. U.S.A. 18, 246-248. 

One may also obtain the ellipse nature of W(A) for A a 2 x 2 matrix 
by inequalities. By the unitary invariance and by a translation and then a 
constant normalization, we may assume that 


a= Sh 


Any unit vector f may be written f = et* (cos 0, e** sin 0). Then 
f* Af = (r cos 20 + cos ġsin 20) + i(sin ġ sin 20) = x + ty, 
from which 
If*Af — r cos 20|? = sin? 20 = (x — r cos 20)? 4 y’, 


which clearly describes a circle (¢ varying) centered at rcos 20, as in the 
above proof. Now, instead of differentiating to find the envelope of these 
circles, we may use the fact that cos 20 must be real and bounded in mag- 
nitude by 1. From (r? + 1) cos? 20 — 2zr cos 20 + (x? + y? — 1) = 0, by the 
quadratic formula we have 


zr + ((r? + 1) - (2? + y?(1 +r?)))? 
r? +1 l 
That cos 20 must remain real yields the inequality 


cos 20 = 


2 
+y* <1, 


r-+1 


an ellipse with semiaxes r° + 1 and 1. Then fixing ¢ and letting 0 run 
shows how the ellipse is filled, in an interesting way: for example, choose 
$ = 1/2, then 


x? tye r? + sin? 20 
r? +1 r?+1 


? 


which varies from 1 down to r*/r? +1. This is a shell-like filling action. 
One can, of course, by controlling the way in which @¢ and 0 run, follow 
other filling recipes. The envelope-finding proof we gave in Lemma 1.1 
guarantees that all of these recipes will fill the ellipse. 
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Originally, Toeplitz proved that the boundary W (T) was convex and 
later Hausdorff proved that W(T) was simply connected. 
O. Toeplitz (1918). “Das algebraische Analogon zu einem Satz von Fejér,” 
Math. Z. 2, 187-197. 
F. Hausdorff (1919), “Der Wertvorrat einer Bilinearform,” Math. Z. 3, 
314-316. 
The Toeplitz-Hausdorff Theorem has many proofs. To the best of our 
knowledge, the most recent is due to 
C. K. Li (1994). “C-Numerical Ranges and C-Numerical Radii,” Linear 
and Multilinear Algebra 37, 51-82. 
Other references to this theorem can be found in [H, HJ2, M, BD] and 
M. Goldberg and E. G. Straus (1979). “Norm Properties of C-Numerical 
Radii,” Linear Alg. Appl. 24, 113-131. 

A short proof covering unbounded operators was given in 
K. E. Gustafson (1970). “The Toeplitz—Hausdorff Theorem for Linear Op- 
erators,” Proc. Amer. Math. Soc. 25, 203-204. 


1.2 Spectral Inclusion 


One important use of W (T) is to bound the spectrum o(T). The spectrum 
of an operator T consists of those complex numbers A such that T — AT is 
not invertible. For our purpose of showing that the spectrum of an operator 
is included in its numerical range, it is enough to look at the boundary of 
the spectrum. 

It is well known (see, for example, [H], Problem 63) that the boundary of 
the spectrum is contained in the approximate point spectrum Capp, which 
consists of complex numbers A for which there exists a sequence of unit 
vectors {fn} with ||(T — AI) fn|| — 0. Since W(T) is convex, it suffices to 
show that Oapp(T) cC W(T)). 


Theorem 1.2-1 (Spectral inclusion). The spectrum of an operator is con- 
tained in the closure of its numerical range. 


Proof. Consider any À € Capp(T) and a sequence {fn} of unit vectors with 
I(T — AL) fal — 0. 
By the Schwarz inequality, 


IUT — AL) fn, fadl < I(T — AD fal) — 0. 


Thus (T fn, fn) > à. SoAECW(T). O 

Notice that the spectral inclusion enables us to locate the spectrum of 
the sum of any two operators A and B. Even though o(A+ B) has nothing 
to do in general with o(A) and o(B), we still have 


o0(A+B)CW(A+ B) Cc W(A) + W(B). 
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As the set W(T) is convex, we can also conclude that the convex hull 

co(o(T)) is contained in W (T). Generally speaking, even though W (T) is 
often used to bound o(T), the latter can be much smaller. 

Example. Let H = C x C and T, the operator represented by a matrix 


0 0 
T= $ 4 | 
If f = (fi, fo), we have Tf = (0, f1), (Tf, f) = fife, and W(T) = {A € 
C, |A| < å}. However, o(T) = {0}. 
By contrast, selfadjoint operators T have their spectra bounded sharply 
by W(T), as is evidenced by the following simple relations between o(T) 
and W(T). 


Theorem 1.2-2. T is selfadjoint iff W(T) is real. 


Proof. If T is selfadjoint, we have, for all f € H, (Tf, f) = (f,Tf) = 
(Tf, f), and hence W(T) is real. Conversely, if (T f, f} is real for all f € H, 
we have (Tf, f) —(f, Tf) =0 = ((T —T*)f, f). Thus the operator T — T* 
has only {0} in its numerical range. As will be shown in the next section, 
such an operator has to be the null operator. So T-T* = 0 and T = T*. O 


Theorem 1.2-3. Let T be selfadjoint and W (T) = the real interval |m, M]. 
Then ||T|| = sup{|m|, |M|}- 


Proof. Let w(T) = sup{|m|], ||}. Then, for any real real A 4 0, we have 
by the identity 


A\|Tx\|? = (T (Ax + ATT), Ax + A'T) — (T (Ax — AT), 
Ax — A'T) 
< w(T)[|Av + A7*T2x||? + Az — AT! TI? 
= 2w(T)(A* |x|? + A~*||Tz||*). 


Taking à? = "ll yields ||Tz|| < w(T)||z||- O 


(1.2-1) 


Theorem 1.2-4. Let W(T) =[m, M]. Then m,M € o(T). 


Proof. Since m € W (T), there is a sequence of unit vectors {fn} such that 
(Tfn, fn) > m. Hence ||((T — m) fn, fn)|| = I(T - m)? fall? > 0. Also, 
I(T —m)f,|| — 0 and so m E€ dapp(T) C o(T). O 


Notes and References for Section 1.2 


The spectral inclusion was first given by 
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A. Wintner (1929). “Zur Theorie der beschrankten Bilinearformen,” Math. 
Z. 30, 228-282. 

The spectral theory and spectral mapping theorem as well as the spectral 
inclusion can be found in [H]. Let us recall here that those À not in the 
spectrum o(T’) are called the resolvent set p(T) of T, and thereupon, the 
operator (T — AI)~! is called the resolvent operator for T. 


1.3 Numerical Radius 


The numerical radius w(T) of an operator T on H is given by 
w(T) =sup{|A|, Ac W(T)}. 

Notice that, for any vector x € H, we have 

(Tz,z)| < w(T)|\z\/°. 

Example. Let T be the (right) shift operator on C” defined by 


0 0 O 
1 0 0 
T=|O0 1 0 
a 0 
0 0 O -:--- 0O 1 O 


If f EC”, f= (fi, fo,---, fn), we have Tf = (0, fi, fo,..-, fn—1) and 
(Tf, f) = fifet+ fafs +--+ fn—ifn- 
Thus |(Tf, f)| < |fil| fe] +---+|fn—1il|fn|. We have thus to calculate 
sup{| fill fel +: + |fr—allfnl} 


subject to the condition X`; |fil? = 1. 
Let r; = |f;|, and consider the Lagrange function 


(1.3-1)  F(ri,ra,..-,Tn, A) = 7172 +++ +Tn-1fn — A È r? — ) . 


1 
From (1.3-1) we may set 


OF 
ory 
OF 
Ore 


= rə — 2Ar,; = 0, 


= rı +73 — 2Arg = 0, 


(1.3-2) 
OF 
OTn-1 
OF 
Orn 


= Tn-2 Tn — 2ATn—1 = 0, 


= Tn-1 — 2ÀATn = Q. 
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Writing x = (r1,T2,...,Tn) and 
011 0 -:--- 00 
1 0 0 
A=/0 1 0 1 =: ifb 
0 1 
0 0 > 1 0 
we can write the equations (1.3-2) in the form 
Ag = 2At. 
Notice that A = 2B, where B is the Hermitian matrix which has } 


2 
in the subdiagonal and the superdiagonal and 0 elsewhere. Thus A is an 


eigenvalue of B. These eigenvalues of B are known to be (see Notes and 
References) 


kr 
n+l’ 
Notice further that by multiplying (1.3-2) by r1,r2,...,rn in order and 
adding, we get 


(1.3-3) 2(T1T2 +*+ + Tn-1rn) — 2A(r? +- +7?) =0. 
Thus À = rır2 +: + Tfn-1ıfn since r? +rł +.: +r? = 1. Since the 


maximum value of the expression for is w(T), we have necessarily that 


kr 
T)= ——, k=1,2.... 
w( ) sup {cos ET, >“) n) 


À = Cos k=1,2,...,n. 


Thus 
T 
.3-4 = . 
(1.3-4) w(T) cos =T] 


Notice that as n — œ, w(T) > 1. — 
We now show that w(T) is an equivalent norm of T. 


Theorem 1.3-1 (Equivalent norm). w(T) < ||T|| < 2w(T). 


Proof. If A = (Tx, x) with ||z|| = 1, we have by the Schwarz inequality 
IA] < (Tz, 2)| < ||Tz|| < IT|. 


To prove the other inequality, we use the following identity (polarization 
principle), which may be verified by direct computation, 


(1.3-5) 4(Tz,y) = (T(z +y), 2 +y) — (T(x — y), z- y) 
+ i(T(x + iy), x + iy) —i(T (ax — iy), x — ty). 


Hence 
4\(Tx,y)| < w(T){llz + yll? + lle — yll? + lla + tyll? + [x — iyll?] 
= 4w(T)|llal|° + yl]. 
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Choosing ||x|| = |/y|| = 1, we have 

4\(Tx,y)| < 8w(T), 
which implies 

ITI < 2w(T). O 


Theorem 1.3-1 implies that T = 0 whenever w(T) = 0. Notice that this 
result is not valid in a real Hilbert space, as the following example shows. 
Example. Let H = R x R and 


0 -1 
r=|f 5 

For f = (fı, f2), [fll = 1, we have Tf = (~ fz, fa) and 
(Tf, f) =0. 


However, ||T'| = 1. 

Let us now look at two extreme cases of the inequality in Theorem 1.3-1. 
In the following, we refer to the spectral radius r(T) = sup {|A|, A € o(T)} 
and the point spectrum o,(T) = {A € o(T), Tf =Af for some f € H}. 
Theorem 1.3-2. If w(T) = ||T||, then 


r(T) = |T|. 


Proof. Let w(T) = ||T|| = 1. Then there is a sequence of unit vectors 
{fn} such that (T fn, fn) > à E€ W(T), |A| = 1. From the inequality 


|(T fn, fn)| < |T fal] < 1, 


we have ||Tf,|| — 1. Hence 


(1.3-6) I(T — AL) fall? = IT Fall? — (T fn, Afa) — (Afa, T fn) + Ifall? — 0. 
Hence À € dapp(T) and r(T) =1. O 


Theorem 1.3-3. If \ € W(T), |A| = ||T'|, then A € o,(T). 


Proof. Let A = (Tf, f), ||f|| =1. Then 
ITI] = AL = KTS, FP) SITS < ITI. 


So (TF, f)| = ITFI. Thus Tf = uf for some u € C. However, 
à= (Tf, f) = (uf, f) =u and hence Tf = Af. O 

Let us now look at the other extreme condition, w(T) = $||T||. A suf- 
ficient condition for this in terms of R(T) = {Tf,f € H} and R(T*) = 
{T* f, f € H} is given by the following. 
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Theorem 1.3-4. If R(T) L R(T*), then w(T) = ł||T||. 


Proof. Let f € H, ||f|| = 1. We can write f = fı + fo, where fı E€ N(T), 
—— —— Ll 
the null space of T, and fz € R(T*). Recall that R(T*) = N(T) by the 
fundamental theorem of linear algebra. Thus we have 
(Tf, f) =T + fa), fi + fo) = (Tf fi) 
since Tfi = 0 and (T fo, fa) = (fo, T* fe) = 0. Thus 


3-7) KEF PN SITIA < EL ee +e = EL. 
Since f is arbitrary, we have 
w(T) < seal <w(T). O 


We have seen one example of an operator T with w(T) = 5 ||T||, namely 
the two-dimensional shift 
0 0 
s= [0 9) 


The following theorem shows that some operators T with w(T) = 4 ||T'|| 
have Sz as a component. 


Theorem 1.3-5. If w(T) = § ||T|| and T attains its norm, then T has a 
two-dimensional reducing subspace on which it is the shift So. 


Proof. Let ||T|| = 1 = 2w(T) = |T fill = ||fil| for some fi. If fo = Thi, 
we have || fall? = (fa, fe) = (fe,T fi) = (T* fa, fa) = ITA? = Ifl? 
Hence, as IT™ = = 1, we have T* fo = fy. Since, for any 0, w(T) = 
wie Op) — , we have wle OT +e~t®T*) < 1. However, the last operator is 

selfadjoint and hence, by Theorem 1.2-3, we have Je ®T + e~?T*|| < 1. 
In particular, le®T fo + e ®T*fə| < 1. Using T*fo = fı, we have 
IIT fall? < 2Re [e?? (T? fo, fo)| for all 6. Hence Tf = 0. Similarly, the 
other inequality ||e®T fı + e~T* f,|| < 1 yields T* fı = 0. We then have 
(fi, fo) = (fi, Thi) = (T* fi, fr) = 0. So, fi, f2 form an orthonormal basis 
for a two-dimensional subspace M. From the relations T fi = fo, T* fi = 0, 
T fo = 0, andT* f2 = fı, we find that M is a reducing subspace for T and 
that the matrix of T on M is 

0 0 

$ D, 0 


The numerical radius, in addition to being an equivalent norm, also 
bounds ||T”f|| for a given f. There has been a considerable amount of 
research on finding the best constant C such that ||T” f|| < CIIfI| when 
w(T) <1. 
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Theorem 1.3-6. Let w(T) = 1. For any f, ||f|| = 1, ||T"f|| — £ < 2. If 
l = V2, then ||T"z|| = V2 for n =1,2,.... 


Proof. We construct a sequence of n + 1 by n + 1 determinants Dn, 
choosing Do = 1 and 


1 _ ITF? 
IT Fil? i 
-= ITF 
1.3-8 Dn = 2 
(1.3-8) . air"? 
—||T" 2 
Il ill IT” F ||? 

We will show that, for all n = 1,2,..., D, > 0, and that the sequence 
py is decreasing. Choosing 


y=aor+a,Txrt+---+a,T" 2, 


we have, since w(T) < 1, 


(Ty, y)| < (yyy) 


and hence 


(1.3-9) aoay||Tx\|? + aya2||T?x||? +--+ X. aja; (T*x,T%2) 


tj 
i+14j 


n 
< ap +a7||T2|?+---+ XO aja; (T*a, TIT). 
i,j=0 
ij 
The cross terms in (1.3-9) can be eliminated by replacing T by eT and 
integrating between 0 and 27. We thus get the inequality 


(1.3-10) apa;||Tz||? +--+an—14n||T"z||* < ag +a? ||Tz||? +--+a2 ||T"2||?. 


This shows that the quadratic form (the right hand side of (1.3-10) minus 
the left hand side of (1.3-10)) associated with each D,, is nonnegative. 
This fact is now used as follows. Assuming that ||Tx|| Æ 0, we have Dı = 


Tz]? — [Tel > 0. Consider 


1 
(1.3-11) Dn = Dn—1||T"2|| — qPn—allT*2 I’. 


Let Dp = 0 for some k > 1. Then Dk+ı = —§ |[T*t!2||?Dy_1, which is 


impossible unless ||T**!z||? = 0. Hence ||T"z|| = 0 for all n > k. In this 
case, £ = 0. 
Let us now suppose that D, > 0 for all n. Then 
Dn Dp-1||T"2\\? ~~ 1 D,~2||T"~*2||? Dn-1 


1.3-12 in Snell FN Tg nelle Fe 
l 3 ) Dn-1 Dn-1ı T 


? 


S 


n— 
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so the sequence 722 - is convergent. Let L be its limit. Solving for ||Tz'| 


in (1.3-12) gives 


Dn—1 Dn-1 2 Dn ? 

3-1 Te ||? = < —— + — 2L. 
(1.313) IT" ptt |B] Pt > 
Since we know that 

D; [Te]? Tz- 1|? 

3- L< 2 = |Tr| -= =1 -4 <] 

(1.3-14) < j= lte- = <1, 


we may conclude that ||T”z|? = 2L <2. O 


Theorem 1.3-7. Let w(T) = 1 and ||T" f || = 2 for some n and unit vector 
f. Then ||T*z|| = v2 for k =1,2,...,n—1 and T”™!gz =0. 


Proof. With the notation of Theorem 1.3-6, we have ||T”z|| = 2. We then 


have 
Dyp-1 Dy-2 D: 
1.3-1 1 < < [L < — <1. 
(13-15) $ Daa © Das © Ê Do © 
So Dn-1 = Dn-2 = ++: = D; and ||T*a||? = 2 for k = 1,2,...,n— 1. Also, 
D,, = 0 and 


1 
Dny = |T" "£| Dn - z ITa]? Da-: 2 0, 


and so ||T"*!z|| =0. O 
The following theorem gives a sufficient condition for an operator to be 
a projection. 


Theorem 1.3-8. If T is idempotent and w(T) < 1, then T is an orthog- 
onal projection. 


Proof. It is sufficient to prove that T = 0 on R(T)+. Let z € R(T) and 
y = Tx. Then for t > 0 we have 
T(x + ty) =Tz +tT°x =y+tTx = y + ty. 
Thus 
(T(x + ty), x + ty) = ((1 + t)y, x + ty) 
= ((1 + t)y, ty) = (1 + t)ellyll? 


as x L y. On the other hand, we have (T(x + ty), x + ty) < ||z + ty||? as 
w(T) < 1. We thus have 


(1+ t)tliyl? < le + tyl? = llel? + tly’, 


(1.3-16) 


so t|lyl|? < ||z||?. Since t is arbitrary, we conclude that ||y|| = 0. Therefore, 
T=0o0on R(T)+. O 
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Notes and References for Section 1.3 


The example of w(T) for the right shift operator in C” is due to 
M. Marcus and B. N. Shure (1979). “The Numerical Range of Certain 0, 
1-Matrices,” Linear and Multilinear Algebra 7, 111-120. 

The calculation of the eigenvalues of the matrix B in the same example 
can be found in 
M. Marcus (1960). “Basic Theorems in Matrix Theory,” Nat. Bur. Stan- 
dards Appl. Math., Sec. 57. 

Numerical ranges of weighted shifts were calculated by 
Q. F. Stout (1983). “The Numerical Range of a Weighted Shift,” Proc. 
Amer. Math. Soc. 88, 495-502. 

W. C. Ridge (1976). “Numerical Range of a Weighted Shift with Periodic 
Weights,” Proc. Amer. Math. Soc. 55, 107-110. 

The calculation of the numerical radius of a 2 x2 matrix, though straight- 
forward, is quite involved. The pertinent formulas were summarized for the 
first time in 
C. R. Johnson, I. M. Spitkovsky and S. Gottlieb (1994). “Inequalities 
Involving the Numerical Radius,” Linear and Multilinear Algebra 37, 13- 
24. 

A general form of Theorem 1.3-4 appeared in 
R. Bouldin (1971). “The Numerical Range of a Product II,” J. Math. Anal. 
Appl. 33, 212-219. 

The reducing subspaces of Theorem 1.3-5 for operators with 2w(T) = 
\|Z'|| were first given by 
J. P. Williams and T. Crimmins (1974). “On the Numerical Radius of a 
Linear Operator,” Amer. Math. Monthly 74, 832-833. 

The best constant C for the inequality ||T” f || < C'||f|| of Theorem 1.3-6 
was found to be v2 by 
M. J. Crabb (1971). “The Powers of an Operator of Numerical Radius 
One,” Mich. Math. J. 18, 252-256. 

Earlier work on this problem appears in 

C. A. Berger and J. G. Stampfli (1967). “Norm Relations and Skew Dila- 
tions,” Acta. Sci. Math. (Szeged) 28, 191-195. 

T. Kato (1965). “Some Mapping Theorems for the Numerical Range,” 
Proc. Japan Acad. 41, 652-655. 

The finite dimensional version of this inequality is rendered more easily 
using the theorem on the ascent of an operator proved by 

N. Nirschl and H. Schneider (1964). “The Bauer Field of Values of a 
Matrix,” Numer. Math. 6, 355-365. 

For more details, see | BD]. 

Theorem 1.3-8 and further consequences of w(T) < 1 are due to 
T. Furuta and R. Nakamoto (1971). “Certain Numerical Radius Contrac- 
tion Operators,” Proc. Amer. Math. Soc. 29, 521-524. 

The numerical radius of nilpotent operators is discussed in 


1.4 Normal Operators 15 


U. Haagerup and P. de la Harpe (1992). “The Numerical Radius of a 
Nilpotent Operator on a Hilbert Space,” Proc. Amer. Math. Soc.. 115, 
371-379. 


1.4 Normal Operators 


From Theorems 1.2-3 and 1.2-4, we see that for a selfadjoint operator T, 
we have 


r(T) = w(T) = |T|, 


i.e., equality of spectral, numerical and operator radii. This property gen- 
eralizes to normal operators, as will be seen in the following. In turn, the 
properties of a normal operator related to its numerical range have served 
as the source of a variety of generalizations to other operator classes, which 
will be developed further in Chapter 6. Recall that normal operators, those 
T for which T*T = TT*, may be regarded as a generalization of selfadjoint 
operators T in which T* need not be exactly T but commutes with T. 


Theorem 1.4-1. If W(T) is a line segment, then T is normal. 


Proof. Let a be a point on the line segment with inclination 6. Then 
W (e~**[T—a]I]) is contained in the real axis. Thus e~*®[T—aJ] is selfadjoint 
and so T is normal. O 


Theorem 1.4-2. If T is normal, then ||T”|| = ||T||", n = 1,2,.... More- 
over, then 


(1.4-1) r(T) = w(T) = |T|. 


Proof. For any z € H, 

(1.4-2) Tz]? = (T*Tz, £) S ||T*T{]]. 

Hence ||T'||* < ||T?||. Since we always have ||T?|| < ||T||?, we conclude 
that IT?] = ]IT|?. Now, |T°xl? = (T*T%«,T"—!2) < |TT" zl 
for n = 2,3,.... Combining this result with ||T||? = ||T?|| and using 
induction, we prove that ||T™|| = ||T||", n = 1,2,.... Moreover, recalling 
that r(T) = lim |T”, since ||T”|| = ||Z'|", we have r(T) = ||T|| and 
(1.4-1). O 


Theorem 1.4-3. Let z be any complex number in the resolvent set of a 
normal operator T. Then 


(1.4-3) I(T — zI)z|| > d(z,o(T)) for x€H, zl =1. 
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Proof. Since (T — zI)~! is normal, we have 
I(T — 21)~*|| = r((T - 21)~*), 
and so 
I(T — 2f)~*|| = sup{|Aj, à € of (T — 2f)~*}. 


Using the spectral mapping theorem, we obtain 


1 
1.4-4 T =z} = ———_.. 
(1.4-4) I-21 N= T 
Thus for any z € H, ||z|| = 1, we have 
(1.4-5) d(z,o(T)) = I(T — 21)" ||" < \(T—2f)z]. O 


Theorem 1.4-4. The closure of the numerical range of a normal operator 
is the convex hull of its spectrum. 


Proof. We need only to prove that any closed half-plane in C containing 
o(T) also contains W (T). Without loss of generality, we can assume that 
o(T) c {A: ReX < 0} and that the imaginary axis is a supporting line for 
co(a(T)). 

Suppose that a + ib € W(T) with a > 0 and (Tz,2) = a+ ib, |\z|| = 1. 
Let Tx = (a + ib)x + y, where (x,y) =0. Let c € R, c > 0. Then c ¢ o(T) 
and we have 


d(c, o(T)) < ||(T — cf)z\, 
1.€., 
c? < ||(a — c + ib)z + yll? = (a—c)? + b? + |lyll?. 


Hence 2ac < a? +b? with a,c > 0. This is impossible since c is arbitrary. O 

The following theorem provides a necessary and sufficient condition for 
the closedness of W (T) of a normal operator in terms of the extreme points 
of W(T). A point z is an extreme point of a set S if z € S and there is a 
closed half-plane containing z and no other element of S. 


Theorem 1.4-5. The extreme points of the closure of the numerical range 
W (T) of a normal operator T are eigenvalues of T if and only if W (T) is 
closed. 


Proof. Let W(T) be closed. We can assume that the extreme point is 
z = 0 and that W(T) C {A : Im) > 0} and (Tz, x) = 0 € W(T); hence 
((T — T*)x,z) = 0. Since the operator + (T — T*) > 0, it follows that 
(T — T*)x = 0. Consequently, x is an element of the closed subspace 
{f:Tf =T*f} =N. Since T is normal, we have 


T*Tx = TT*x = TT, 
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and hence the subspace N is invariant for T and T|y is selfadjoint. Ob- 
viously, W(T|n) c W(T) and W(T|n) C R by Theorem 1.2-2. Hence 
W(T|n) c W(T)OR = {0}, and thus T|y = 0 and Tz = 0, i.e., 0 € op(T). 

The converse is true for any operator T. The compact convex set W(T) 
is the convex hull of its extreme points. When the latter are eigenvalues of 
T, as assumed in the theorem, we have 


W(T) C co(o,(T)) C co(W(T)) =W(T). O 


Notes and References for Section 1.4 


The bound of Theorem 1.4-3 was observed by 
G. H. Orland (1964). “On a Class of Operators,” Proc. Amer. Math. Soc. 
15, 75-79. 

The equality of Theorem 1.4-4 of the convex hull of the spectrum and 
the numerical range was first mentioned in 
M. H. Stone (1932). Linear Transformations in Hilbert Space, American 
Mathematical Society, R.I. 

This property was established later in 
S. K. Berberian (1964). “The Numerical Range of a Normal Operator,” 
Duke Math. J. 31, 479-483. 

The proof we adopted in Theorem 1.4-5 is due to 
J. G. Stampfli (1966). “Extreme Points of the Numerical Range of a Hy- 
ponormal Operator,” Michigan Math. J. 13, 87-89. 

An earlier proof appears in 
C. H. Meng (1957). “A Condition That a Normal Operator Has a Closed 
Numerical Range,” Proc. Amer. Math. Soc. 8, 85-88. 

The relationship of Theorem 1.4-5 to Theorem 1.3-3 should be noted. 
The point is that the numerical ranges of finite-dimensional normal opera- 
tors are polygons whose vertices are eigenvalues. 

Resolvent estimates are key to many aspects and applications of operator 
theory. In particular, certain operator classes have been defined in terms 
of them. These classes will be discussed in Chapter 6. Also, they play 
an important role in numerical analysis, as will be seen in Chapter 4. For 
normal operators, such estimates sometimes become exact, as in (1.4-4), 
for example. For the general case, the following should therefore be kept 
in mind: 


I(T — AL)z| 


(1.4-6) UA, o(T) 2 |(T -ADTT = if 
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1.5 Numerical Boundary 


A natural question regarding the boundary of the numerical range is: which 
points of W(T) are on the boundary? In this section we will characterize 
them and further obtain some properties of special extreme points. 

For each complex number z, let M, be the subset of H given by 


(1.5-1) M, = {x : (Tx, zr) = 2||x\|*}. 


Note that M, is homogeneous, closed and not necessarily linear. Let sp M, 
denote the linear span of M,. 


Theorem 1.5-1. If z € W(T) is an extreme point of W(T), then M, is 
linear. 


Proof. Let L be the line of support of W(T) at z. Then, for some 9, 
Re ((e~*9(T — zI)z,x) 2 0 and 
(1.5-2) M, = {x : Re ((T — zI)z, x) = 0}. 
For x,y € Mz, 
(e™® (T — zI)(£ +y), (£ +y)) = a = —(e™* (T — 2I)(-x + y), (- + y)) 


is purely imaginary. Ifa # 0, we have purely imaginary elements of 
W (e~t? (T — zI)) in both the upper and lower half-planes, and z is not 
an extreme point. So a = 0 and z +y E€ Mz. UO 


Theorem 1.5-2. z € W (T) is an extreme point if M, is linear. 


Proof. Suppose that z is not an extreme point. Thus z is an interior point 
of the line segment joining a,b € W (T) with (Tz, x) = a and (Ty, y) = b, 
Izl = |ly|| = 1. It can be shown that there exist t € (0,1) and a complex 
A, |A| = 1, such that tz + (1 —t)Ay € Mz. Since (T(—Ay), (—Ay)) = b, we 
can similarly assure that for the same there is an s € (0,1) such that 
sx + (1 — s)(—Ay) E€ M, so x E€ M,. But a # z and Ma N M, = {0}, a 
contradiction. Hence z is an extreme point. U 


Theorem 1.5-3. Let z be a nonextreme boundary point of W(T) and L 
the line of support at z with inclination 0. Then 

N =| J{Ma:a € L} 
is a closed subspace of H and sp M, = N. 


Proof. For all a € L, arg(a — z) = @ is constant and e~t? (a — z) is real. 
Consequently, 


N = {zx : (e? (T — zI)x, x) is real}. 
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Since L is a line of support, we have either Im e~t? (T— zT) = 0 or Im e (T — 
zI) < 0. Assuming the former, we have a selfadjoint operator Im et? (T — 
zI) = B and 


N = {zx : (Bz,x) = 0} = {x : Bx = 0} 
or equivalently, 
N = {x : ° (T — zI)x = e? (T* — zI)z}. 


So N is a closed subspace of H. As in Theorem 1.5-2, we have for each 
QEL, 


(1.5-3) Ma C M: + M: = sp M,, 


and hence N C spM,. Since M, C N and N is a subspace, we have 
spM: CN. U 

So far, we have looked at OW (T) N W(T). Attempts to describe the 
unattained boundary points W (T) — W (T) lead to the consideration of 
special sequences of vectors in H. Notice that for any z € OW(T) there is 
always a sequence of unit vectors {fn} such that {(T fn, fn)} —> z. Since 
the unit sphere is weakly compact, by taking subsequences, we can assume 
that fa — f and (Tfn, fn) — z. Let us now define a special property 
(P) of such sequences as follows. Let {fn} be a sequence of unit vectors in 
H, fn —> f and (Tfn, fn) > z. Then {fn} is said to have property (P) if 
either f = 0 or (Tf, f) = z| f ||? 
Theorem 1.5-4. If z is an extreme point of W(T) and (Tfn, fn) > z, 
\|fn|| = 1, then the sequence {fn} has the property (P). 


Proof. Let gn = fn — f, gn > 0, and ||gn|| < 2. Hence, by considering 
subsequences, we may assume that ||g,|| —> a € R. Since 1 = ||f,||? = 
llgn + FI? = llgnll* + IFI? + 2Re (gn, f), we have 


(Tfn, fn) = (Tgn, 9n) + (gn, T*f) + (Tf 9n) + (Tf, f) > z. 


Since {9n} 5, 0, (TIn, 9n) —z-— (Tf, f). 
If a0, let (Tf, f) = all fI? and 


(T9n, Jn) 
lga]? 


Then 6, — à € W (T), and we have (Tgn,9n) = àa? and z = a| f ||? + Aa?. 
Since a° + ||f||? = 1, we have that z lies on the segment joining a and A, 
both belonging to W (T). So z cannot be an extreme point. Thus, either 
X = zor a =z. In either case, z = a and (Tf, f) = z||f||?. So {fn} has 
the property (P). O 


A point z € W(T) is called a corner if W(T) is contained in a half-cone 


with vertex at z, and the semivertical angle of the cone is less than 5. 
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Theorem 1.5-5. If z € W(T) is a corner of W (T), then z € o,(T). 


Proof. Let z be a corner and (Tf, f) = z, IfI = 1. If Tf Æ Af for 
any A, then f and Tf are linearly independent. Let E be the orthogonal 
projection onto the subspace spanned by f and Tf. W(ETE) is an ellipse 
contained in W (T). Since z € W(ETE), z is contained in the ellipse. This 
is impossible since z is a corner unless the ellipse is degenerate and z is an 
eigenvalue u of ETE. But then ETE f = uf = Tf, a contradiction. O 


Corollary 1.5-6. Ifz € W/(T) is a corner of W (T), then z € Oapp(T). 


Proof. Embed (see Notes) H in a larger Hilbert space K and extend T toa 
bounded linear operator T on K such that oapp(T) = op(T). Then Tz = Ax 
for some x # 0. Ifz, — xz and Tz, — Tz, we have ||(T —AI)z,|| 0. O 


Corollary 1.5-7. If W(T) is a closed polygon, W (T) = coa(T). 
Proof. By Theorem 1.5-5, the vertices of the polygon are in o(T). O 


Corollary 1.5-8. A compact polygon with m vertices is the numerical 
range of an operator in an n-dimensional space iff m < n. 


Proof. If the numerical range polygon has m vertices, they are corners. 
By Theorem 1.5-5, each of them is in øp(T) and there can be at most 
n of them, by the linear independence of their corresponding eigenvectors. 
Conversely, if we consider the normal operator T represented by the matrix 
A = [a,;] with elements 


then W(T) =coo(T). O 


Notes and References for Section 1.5 


The geometry of the numerical range and its boundary, especially W (T) — 
Int W(T), was studied by 

J. Agler (1982). “Geometric and Topological Properties of the Numerical 
Range,” Indiana Univ. Math. J. 31, 766-767. 

The fact that any subset G of C can be the numerical range of some 
bounded operator, if G — Int G = Eg U EF, where Ep is countable and =; is 
a union of smooth subarcs of a conic section, was shown by 
M. Radjabalipour and H. Radjavi (1975). “On the Geometry of Numerical 
Range,” Pacific J. Math. 61, 507-511. 
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The definition of M, and their use in relating extreme points are studied 
by 
M. Embry (1970). “The Numerical Range of an Operator,” Pacific J. 
Math. 32, 647-650. 

Notice that both of the Theorems 1.5-1 and 1.5-2 need z € W(T). The 
case when z € W(T) — W(T) was studied by 

B. Sims (1972). “On the Connection Between the Numerical Range and 
Spectrum of an Operator in a Hilbert Space,” J. London. Math. Soc. 8, 
57-59. 

K. C. Das (1977). “Boundary of Numerical Range,” J. Math. Anal. Appl. 
60, 779-780. 

and 

G. Garske (1979). “The Boundary of the Numerical Range of an Operator,” 
J. Math. Anal. Appl. 68, 605-607. 

We will speak more about dilation theory in the next chapter. However, 
the embedding argument of Corollary 1.5-6 follows from a construction in 
S. K. Berberian (1962). “Approximate Proper Vectors,” Proc. Amer. 
Math. Soc. 118, 111-114. 

S. K. Berberian and G. H. Orland (1967). “On the Closure of the Numerical 
Range of an Operator,” Proc. Amer. Math. Soc. 18, 499-503. 

The case of compact operators was studied extensively by 

G. de Barra, J. R. Giles and B. Sims (1972). “On the Numerical Range of 
Compact Operators on Hilbert Spaces,” J. Lond. Math. Soc. 5, 704-706. 
That the extreme points of W (T) lie in the spectrum for a normal operator 
T was shown by 

C. R. Macluer (1965). “On Extreme Points of the Numerical Range of 
Normal Operators,” Proc. Amer. Math. Soc. 16, 1183-1184. 

The corresponding result for a hyponormal operator (see Chapter VI) was 
given by 

J. G. Stampfli (1966). “Extreme Points of the Numerical Range of Hy- 
ponormal Operators,” Michigan Math. J. 13, 87-89. 

The same result for compact operators was given in 

G. de Barra (1981). “The Boundary of the Numerical Range,” Glasgow 
Math. J. 22, 69-72. 

Much earlier 

W. F. Donoghue (1957). “On the Numerical Range of a Bounded Opera- 
tor,” Michigan Math. J. 4, 261-263. 

proved that when W(T) is closed, a point on the boundary at which it is 
not a differentiable arc is an eigenvalue of T. 
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1.6 Other W-Ranges 


The concept of the numerical range W(T) originated from the real line 
segment [Amin, Amax| Of quadratic form values (Tz, x) of a symmetric ma- 
trix T, the continuous segment being fully obtained by the quadratic form 
values as x ranges over the unit sphere ||z||? = 1. The numerical range 
W (T) defined the same way for arbitrary bounded linear operators T on a 
finite- or infinite-dimensional Hilbert space is now an established chapter 
in Hilbert space linear operator theory. 

Generalizations from there have gone in roughly two directions. From 
the matrix analysis viewpoint, a number of variations on W(T) have been 
investigated in finite dimensions. From the operator theory viewpoint, the 
principal generalization has been the extension to Banach spaces based 
upon the Hahn—Banach theorem and the notion of a semi-inner product. 

The matrix analysis variations that have been and are currently being 
studied include the k-numerical range (a joint numerical range); its general- 
ization, the C-numerical range (which has recently received the most atten- 
tion); what we may call the sesquilinear-numerical range {(A1/?Tz, Al/2z), 
|x|]? = 1}, which is just the usual numerical range of T but now in the 
inner product coming from the sesquilinear form (Ax, x}, where A is any 
specified positive definite operator (this is sometimes called the general- 
ized field of values); the M numerical range (based on resolvent growths); 
restricted numerical ranges (in which x is restricted to only portions of 
the unit sphere in the Hilbert space H); symmetrizations of the numerical 
range; and others. These will be discussed further in Chapter 5. 

The Banach space generalization in operator theory relies on the Hahn- 
Banach theorem, from which a standard interesting corollary is as follows. 


Lemma 1.6-1. For every vector x in a normed linear space X, there exists 
a linear functional x* in the dual space X* such that 


z* = |æ? = ||x"||°. 


From this we may define a semi-inner product. A semi-inner product is 
a mapping |, | from X x X to the scalars such that 

(i) [x,y] is linear in x for each fixed y, 

(ii) [z, xz] > 0 when z £ 0, 

(iii) |[z, y]? < [z, z][y, y], 
We shall also always assume that the semi-inner product is consistent with 
the norm: [z, z] = ||x||?. Clearly, properties (i)-(iii) generalize those of the 
usual Hilbert space inner product, the only loss being the bilinearity. 

Example. Given a Banach space X, let a semi-inner product [x,y] be 
defined by 


[£, y] = y*z, 
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where y* is a linear functional corresponding to y as guaranteed in Lemma 
1.6-1. 

Whether or not a Banach space X has more than one such semi-inner 
product depends on the geometry of the unit balls in X and X*. For any 
chosen semi-inner product, one then defines the spatial numerical range as 
usual, 


W(T) = {[Tz, 2]; ||]? = 1}. 


When there is more than one semi-inner product, one defines the (total) 
spatial numerical range to be 


(1.6-1) VT) = [J Wa(T), 


where & indexes all semi-inner products on X. These Banach space nu- 
merical ranges (and similar Banach algebra variations) are fully exposited 
in [BD]. In matrix theory, V (A) is called F(A), the Bauer field of values. 

Important properties for everywhere defined, bounded operators T in- 
clude the following: the closure of V(T) contains not only the spectrum 
o(T) but also its convex hull; V (T) need not be convex but is connected; 
and the numerical radius |V (T)| is (complex scalar field assumed now, of 
course) an equivalent norm and obeys the inequalities 


(1.6-2) IVT) S ITI S eV (T). 


The spectral inclusion property fully generalizes that of the Hilbert space 
case, Theorem 1.2-1. The connectedness property captures the essence of 
Theorem 1.1-2. The norm equivalence inequality, (1.6-2), corresponds to 
that of Theorem 1.3-1, with the ratio 2 replaced by e, which is known 
to be sharp for Banach spaces. Many of the normal and numerical range 
boundary properties of Sections 1.4 and 1.5, respectively, have also been 
generalized to some extent to operators T on Banach spaces and in Banach 
algebras. See [BD] for more information. 


Notes and References for Section 1.6 


Generalizations of the numerical range in the finite-dimensional case are 
summarized in [HJ2]. We shall discuss their properties further in Chapter 
5. An early influential paper was that of 

F. Bauer (1962). “On the Field of Values Subordinate to a Norm,” Nu- 
merische Math. 4, 103-111. 

Because the Banach space generalizations are treated in full in [BD], we 
shall not treat them in this book. Let us note that the [BD] expositions 
are further updated in 
F. Bonsall and J. Duncan (1980). “Numerical Ranges,” in Studies in Math- 
ematics 21, ed. R. G. Bartle, Mathematical Association of America, 1—49. 
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For a speculation that the |V(T)| radius and operator norm ||T'|| must 
go to infinity together, i.e., that T is bounded if and only if the numerical 
range V(T) is bounded, see 
K. Gustafson and B. Zwahlen (1974). “On Operator Radii,” Acta Sci. 
Math. 36, 63-68. 

This proposition is probably related to still not fully understood connec- 
tivity properties of V(T). 

For relationships between the semi-inner product structures of Banach 
spaces and partial inner product structures of rigged Hilbert spaces, and a 
far reaching generalization of such structures to Galois connections, see 
J. P. Antoine and K. Gustafson (1981). “Partial Inner Product Spaces and 
Semi-inner Product Spaces,” Advances in Mathematics 41, 281-300. 


Endnotes for Chapter 1 


Although our general stance in this book will be that of bounded, ev- 
erywhere defined linear operators on a complex Hilbert space, it is worth 
keeping in mind three different general contexts for the numerical ranges 
W(T): 

l. finite-dimensional case, T given by a matrix. W(T) is a compact 
(closed and bounded) convex set; 

2. infinite-dimensional case, T bounded and given, for example, by an 
infinite matrix or an integral. W (T) is convex, bounded, but not necessarily 
closed; 

3. infinite-dimensional case, T unbounded (either closed or unclosed) 
and given, for example, by a singular integral or a differential equation. 
W (T) is convex but unbounded and not necessarily closed. 

In all three contexts (taking T closed and densely defined in the third, 
to keep it simple), the spectrum o(T) is a closed set, a great convenience 
indeed. However, the spectrum o(T)) need not be contained in W (T) in the 
third instance, and the spectrum o(T)) need not be a point spectrum (i.e., 
eigenvalues) in the second instance. There are other distinctions between 
these three situations, and we don’t wish to belabor them. Our point is 
that the “operator theory community” is really three communities. Their 
outlooks differ. Most importantly, the methods of proof may also differ, 
due to distinctions in the three situations such as those we have just noted. 

Often, nearly identical theorems have been proved independently by the 
three communities. It is beyond our effort here to be sure that we have 
identified those instances. In fact we believe that there are sufficiently 
many such instances that we decided not to even try to account for them 
in any systematic way. However, let us give one important example of this 
so that our point is clear. The assumption that 


ee 


0¢W(T) 
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is strong. It means that a complex rotation ef? of T has numerical range 
(and spectrum) contained in a closed right half-plane Rez 2 Ap > 0. This 
assumption has yielded a number of interesting results in matrix theory 
and continues to do so. From the other viewpoint of operator theory, 
such operators, generally unbounded, are called (when rotated) m-accretive 
operators and have been treated extensively as infinitesimal generators in 
the theory of operator semigroups. The weaker condition that Re (Tz, x) 2 
0, already used in the proof of Theorem 1.5-1 and to be used extensively 
in the next chapter and often in later chapters, is called accretive in the 
semigroup theory, where it is equally useful. 

One of the beauties of the numerical range W(T), just as with the spec- 
trum o(T), is that its use and properties do indeed extend over all three 
contexts: matrix analysis, operator theory, and differential equations. Our 
goal in writing this book is to expose the basic properties of W (T) common 
to all three contexts, principally within the setting of bounded operators 
T on a complex Hilbert space H. 

x x x 
As described in Section 1.6, there is a fourth context for the numerical 
range: the generalized numerical range V (T) for T a densely defined op- 
erator in a Banach space. Actually, that context itself divides into two 
cases: 

4. Banach space case: finite- or infinite-dimensional T, bounded or un- 
bounded, spatial numerical range V (T); 

5. Banach algebra case: T an element of a normed unital algebra, alge- 

braic numerical range v(T). 
Of course, there are strong overlaps between contexts 4 and 5, but the 
outlooks are significantly different. In the first, T is usually regarded as 
known to us, and one explores the various properties of V (T), especially as 
they may resemble or differ from those of W (T) in the Hilbert space case. 
In the second, the normed algebras are regarded as the items of interest, 
and the algebraic numerical range is regarded as a tool with which to learn 
more about the algebra. 

There is in fact a sixth context for the numerical range: 

6. Nonlinear operator theory: T a nonlinear operator in a Hilbert or 
Banach space. 

Some of the W (T) and V(T) results extend to nonlinear operators T in 
interesting ways. For example, the Bonsall, Cain and Schneider (see [BD]) 
result that V (T) remains connected depends only on the continuity of T 
and not on its linearity. However, we have taken the view that the core 
theory is linear, and indeed nonlinear operator theories are often quite 
specialized. Examples of the nonlinear numerical range theories may be 
found in 

E. Zarantonello (1967). “The Closure of the Numerical Range Contains 
the Spectrum,” Pacific J. Math. 22, 575-595. 
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H. Brezis (1973). Operateurs Maximauz Monotones et Semi-groups de Con- 
tractions dans les Espaces de Hilbert, North Holland, Amsterdam. 

V. Barbu (1976). Nonlinear Semigroups and Differential Equations in Ba- 
nach Spaces, Editura Academie, Bucharest, Rumania. 

I. Miyadera (1992). Nonlinear Semigroups, Amer. Math. Soc., Providence, 
R.I. 
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Mapping ‘Theorems 


Introduction 


Mapping theorems for the numerical range analogous to the spectral map- 
ping theorem are hard to come by. The analogy is rather limited by the con- 
vexity of the numerical range. However, significant results were obtained 
in relating the numerical ranges and the numerical radii of an operator T 
to those of the operator f(T), where f is is a given function. As can be 
expected, the best results were obtained in the special case f(T) = T”, n 
a natural number. In addition to the preceding results, this chapter also 
gives some other results for the numerical range of products, commuting 
operators, and the natural connections between the numerical range and 
the theory of dilations of operators. 

Let us first observe that even W(T) and W (T?) need not be related in a 
simple way. We can see in the following example that W(T) is contained 
in the sector {-4 <90 S +} and W(T?) is not contained in the sector 
{-2 S05 5}. 


Example. Let H = C? and 


o [4+i 4i 
r=] 4i ar 


For u = (x,y) € C?, we have 
Re (Tu, u) = 4|x|* + 16|y|, 
9 
Im (Tu, u) = 3|z|? + 9 ly]? < Re (Tu, u). 


However, for u = (1,0), (T?u, u) = —1 + 8i. 


K. E. Gustafson et al., Numerical Range 
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2.1 Radius Mapping 


Simple relations between w(T) and w(f(T)) can be obtained in special 
cases. The best-known example is the power inequality. 


Theorem 2.1-1 (Power inequality). Let T be an operator and w(T) < 1. 
Then w(T") < 1, n = 1,2,3,.... 


Proof. Let us first observe that for any z € C, |z| < 1, 
Re ((I — 2T)z, £) = ||z||? — Re (zTz, z) > ||x||?[1 — |z|] > 0 


follows from w(T) < 1. On the other hand, when Re((I — zT)z,x) > 
0 for all z € C, |z| < 1, we have, taking z = tet% and letting t — 1, 
Re (e*°T x, x) < ||x||? and hence w(T) < 1. Thus, whenever I — zT is 
invertible, we have Re ((I —zT)z,x) > 0, V x € H & Re((I—2zT)“y, y) > 
0, Y y € H, using x = (I — zT)~'y. Notice that r(T) < 1 and that I — zT 
is invertible. 

Thus, it is sufficient to prove that for all z € H 


Re((I—2z"T")~'z,z) >0 with ze€C, |z| <1. 


We now use the identity 


(I= 2T")? = È (r= 2) 
+H- wT)! +--+ (I -wT Y], 


where w is a primitive nth root of 1. Since for each of the operators 
(I —w*zT)—! we have 


Re ((I—w*zT)7'z,rz) >0 forall «eH, 
we deduce that 
Re((I—2z"T”")~*2,z2)>0 forall ced. 


Thus Re ((J — z"J”)z,z) > 0, V x € H, |z| < 1, and therefore w(T”) < 
1. O 
Conditions equivalent to w(T) < 1, like the one in the preceding proof, 


(2.1-1) w(T) < 1 => Re ((I — zT)z, x) > 0 => Re((I—zT)7'z,z) > 0, 


for x € H, |z| < 1, play an important role in mapping theorems for the 
numerical radius. Some other obvious equivalent conditions are 


(2.1-2) Re((J+2T)z,2)>0, Vaed, |2| <1, 
(2.1-3) Re (zT(I — 2zT)~‘z,x) >0, Vaed, |z| <1. 


See the Notes and References for further equivalent conditions. 
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The following theorem provides a dilation condition equivalent to w(T) < 
1, which is quite useful in mapping theorems. 


Theorem 2.1-2. (Power dilation) w(T) < 1 if and only if T” = 2PU"P 
for n = 1,2,..., where U is a unitary operator on a Hilbert space K > H 
and P is the projection of K on H. 


Proof. Assuming that w(T) < 1, we have r(T) < 1, and the operator (I — 
zT)~} exists for |z| < 1. The operator valued function F(z) = (I — zT)7! 
is holomorphic in the disk {|z| < 1}, F(0) = I and Re F(z) > 0. It follows 
(see the Notes and References; see also Section 2.6) by a theorem of Riesz, 
generalized to operator valued functions, that there exist a Hilbert space 
K > H and a unitary operator U in K such that 


F(z) = P(Ix + 2U)(Ix — 2U), |z| <1. 
By series expansion, 
F(z) = P[I + 22U + 22°U? +---] 
= Iy + 2T +T? +.. 
Equating coefficients, we obtain 
(2.1-4) T” =2PU"P for n=1,2,3,.... 


The series Iy + zT + z?T? +--- converges for |z| < 1 and equals (Iy — 
zT)—'. On the other hand, the existence of the Hilbert space K and that 
of the unitary operator U such that T” = 2PU"P imply that the series 
In +22U + 2z*U? +--- converges for |z| < 1. Consequently, the series 


In t2T +27? 4... 


converges in norm for |z| < 1 and equals (J — zT)7!. 
Evaluating 


(I + 2U)(I« — 2U)~*y,y) 
and taking y = (Ig — zU)z, we obtain 


Re (Ik +U)(Ik — zU) ty, y) = Re ((Ix + 2U)z, (Ix — 2U)z) 
= (1 —|z|*)||a||? > 0 
for |z| < 1. Hence ((I — 2T)~'z,z) > 0 for |z| < 1. Thus w(T) < 1, using 
(2.1-1). O 
One of the immediate applications of the preceding theorem above is to 
relate w(T) and w(f(T)) when f is analytic. 


(2.1-5) 


Theorem 2.1-3. Let f be analytic in |z| < 1 and continuous on the 
boundary |z| = 1, with f(0) = 0. If |f(z)| < 1 for |z| < 1 and w(T) < 1, 
then w(f(T)) < 1. 
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Proof. We first prove that lim,—; f(rT) exists in the norm. Let U = 
f edE(X) and T” = 2PU"P for n = 1,2,... . Then 


f(rT) = Zanr”T” = 2P(Xa,r"U")P 
(2.1-6) = 2P / Pant" e dB(A) P 
= 2P | J a] P. 
By the continuity of f, we may conclude that 
lim f(r) = 2P | f f(e*)dB(0) P 


= 2Pf(U)P. 


(2.1-7) 


It is easy to see that 
f(T)” = 2P((fU)"|P, 


where f(U) is a contraction and has a unitary dilation. Thus w(f(T)) < 1 
using Theorem 2.1-2. O 


Notes and References for Section 2.1 


The elementary proof in Theorem 2.1-1 is due to 
C. Pearcy (1966). “An Elementary Proof of the Power Inequality for the 
Numerical Radius,” Mich. Math. J. 13, 289-291. 
The final form of the proof is taken from [H]. The dilation condition equiv- 
alent to w(T) < 1 of Theorem 2.1-2 was found by 
C. A. Berger (1965). “A Strange Dilation Theorem,” Amer. Math. Soc. 
Notices 12, 590. 
The version used by us is due to 
B. Sz.-Nagy and C. Foias (1966). “On Certain Classes of power-bounded 
operators in Hilbert Space,” Acta. Sct. Mat. 27, 17-25. 

Conditions equivalent to w(T) < 1 have been studied by many authors. 
The conditions 
zT]? 
w(T) < 1 => RezT 1 - ata) >| +I>0, 


for |z| < 1 and ja| < 1 & ||T—zI|| < 1+{1+ |z|}, z € C, and Theorem 
2.1-3 are due to 
C. A. Berger and J. G. Stampfli (1967). “Mapping Theorems for the Nu- 
merical Range,” Amer. J. Math. 89, 1047-1055. 

A characterization of operators with w(T) < 1 was obtained in 
T. Ando (1973). “Structure of Operators with Numerical Radius One,” 
Acta. Sci. Mat. (Szeged) 34, 11--15. 
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Such T may be represented as T = (I + A)!/2B(J — A)1/* where A is 
selfadjoint and A and B are contractions. This is equivalent to the existence 
of a contraction C such that T = (I — C*C)'/2C. 

If w(T) < 1 and T is idempotent, then T is selfadjoint. If T* = T for 
K > 2, then T*—! is a projection. For such facts, see 
T. Furuta and R. Nakamoto (1971). “Certain Numerical Radius Contrac- 
tion Operators,” Proc. Amer. Math. Soc. 29, 521-524. 
Minimum growth rates of operators T” with w(T) < 1 were studied by 
E. S. W. Shiu (1976). “Growth of the Numerical Ranges of Powers of 
Hilbert Space Operators,” Mich. Math. J. 23, 155-160. 


2.2 Analytic Functions 


We will now look at some mapping theorems for the numerical range W(T) 
and relate W(f(T)) and W(T) when the function f is analytic in various 
regions. 


Theorem 2.2-1. Let f(z) be holomorphic on {z : |z| < 1} = D, and map 
D into {z : Rez > 0} = P. If W(A) c D, then W(f(A)) c P — Re f(0). 
Proof. The basic tool in the proof is the following expression for f(z): 
1 2" ay et tz 

2.2-1 = 1] — t 

22) POSi [Re fle) dt 
which is a consequence of Herglotz’s theorem (see Notes and References). 
Writing 

e*+z e*+z-e% 2e" 
(2.2-2) et — z et _z  — eit __ 2 


we have 
2m l 
f(z) =iIm f(0) + zz / [Re f(e**)][2(1 — e~**z)~* — 1]Jdt 
(2.2-3) | pn 0 
= — / [Re f(e”)](1 — e~z) tdt — f(0). 
0 


T 


Replacing z by A and taking the real part, we have 
1 2r , l 
(2.2-4) Re f(A) = —Re f (0) + = | [Re f(e*(][I — e~ A) ']dt, 
0 


Using (2.1-1), we have Re(J —e~“A)~* > 0. As f maps D into P, we 
have Re f(e*) > 0. Hence, the integral on the right is positive and so 
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Re f(A) > —Re f(0). Thus 
W(f(A)) c P—Ref(0). O 


Corollary 2.2-2. Let f(z) be holomorphic on D, and map D into D. If 
f(0) =0 and W(A) C D, then W(f(A)) C D. 

Proof. We can easily verify that the function g(z) = poet for any 
a € C, |a| < 1, maps D into P. Further, g(0) = 1. Hence, by Theorem 
2.2-1, we have W(g(A)) C P—1. So Re[g(A)+1] > 0. However, g(A)+1 = 
2[I — af(A)]~!, and therefore Re [I — af(A)] > 0. Since a can be chosen 
arbitrarily, we conclude that 


Re[I -e f(A) >20 forall 6 € (0,27). 
Thus W(f(A))c D. O 


Notes and References for Section 2.2 


The expression for f(z) in (2.2-1) can be seen in 
M. H. Stone (1932). Linear Transformations in Hilbert Space, American 
Mathematical Society, R.I. 

The Theorem 2.2-1 and Corollary 2.2-2 are given in 
T. Kato (1965). “Some Mapping Theorems for the Numerical Range,” 
Proc. Japan Acad. 41, 652-655. 
Recently, interesting results for the mapping of f(A) = A* were obtained 
by 
C. K. Li (1995). “Numerical Ranges of the Powers of an Operator,” 
Preprint. 
The following is one of the important results: 

For any real ¢@ € [0, 5), K > 2, let Ad be the triangle with vertices at 
L- et ett} if for any n € C, W(A) C ¢, then W(A*) C n* A(K¢). 


cost?’ 


2.3 Rational Functions 


The results in the two previous sections are in some sense local and are 
independent of the behavior of f in the entire plane. A natural extension 
to a global mapping theorem would be that of a rational function. The 
following theorem provides a mapping for rational functions using the con- 
cept of the convex kernel of a set. The kernel K of a set S is the subset of 
S with respect to which it is star-shaped. 


Theorem 2.3-1. Let f be a rational function with f(co) = œ. For any 
compact convex set F in C, let E = f7} (F) and K be the convex kernel of 
E. Then, for any operator A, with W (A) C Int K, we have W(f(A)) C F. 
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Proof. Let £ be a line of support and tangent to F at b. We shall prove that 
W(f(A)) is contained in the closed half-plane bounded by £ and containing 
F. Let 0 be the angle between the real axis and £ oriented in such a way 
that F lies to the left of 2. We can assume that b is not a branch point of 
fT} as there are only a finite number of them. The poles of [f(z)—}|~? 

all simple, and we can use the partial fraction expansion (as f(0o) = oo) 


[f(z) -o+ = LIC C;)~*, with f'(C:) £0. 
Replacing z by A, 
[f(A) — bt = dF (A-C;I)’. 


We thus have 


Im [e (f(A) — b)? = oe F(C) A — Cel) 
We will prove that for each 2 


(2.3-1) m[e~’ f’(Ci)(A — Cyl) > 0. 
This, in turn, implies that 
(2.3-2) Im [e7 (f(A) — bI)] > 0 


since, for any invertible operator T, ImT > 0 is equivalent to ImT~! < 0. 

At any point C;, we have f(C;) = b and f’(C;) Æ 0. Thus, there is a 
neighborhood of C; on which f is conformal and maps it into a neighbor- 
hood of b. Since b € OF, we have C; € OE, which has a tangent 4; at C;. If 
6, is the inclination of L;, we have e~*? f’(C,;) = e~*% | f’(C;)|. Since W(A), 
in the half-plane left of 2;, we have 


Im [e~*% (A — CiD] > 0 


and hence (2.3-1) holds. 

We can easily see that, in the above argument, the assumption that £ is 
a line of support and a tangent is sufficiently general, since the exceptions 
are at most countable. Further, if Int K is empty, there is nothing to be 
proved. If K is contained in a straight line, then A is normal and 


(2.3-3) W(f(A)) = coo(f(A)) = co f(a(A)) c co f(W(A)). 


Since f(K) C F and F is convex, we have W(f(A)) CF. O 
We remark that the condition W (A) C Int K can be modified to W(A) c 
K by using the continuity of the numerical range. 
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The important theorem on mapping rational functions is due to 
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T. Kato (1965). “Some Mapping Theorems for the Numerical Range,” 
Proc. Japan Acad. 41, 652-655. 

There is some gap between Section 2.2 and Section 2.3. For example, in 
the paper just cited, it is shown that for f(z) a function analytic on the 
half-plane P = {z : Rez 2 0} and A a bounded operator with numeri- 
cal range W(A) contained in P, then W(f(A)) is contained in the closed 
convex hull of f(P). But one cannot generate half-plane numerical range 
mapping theorems automatically from disk mapping theorems because lin- 
ear fractional transformations such as f(z) = (1 — z)(1+z)7!, which map 
D onto P, do not map the numerical range in the same way. 

A related gap is the extension of numerical range mapping theorems 
from the A bounded to the A unbounded case. For example, the Cayley 
transform from A unbounded symmetric to V = (A—iI)(A+iI)~+ bounded 
isometric will not carry the numerical ranges in the same way. 


2.4 Operator Products 


We saw earlier that the numerical radius behaves better than the numer- 
ical range under mappings. Let us now consider the two basic algebraic 
operations of linear structure, addition and multiplication. Because the 
numerical range is subadditive, 


W(A+B) Cc W(A)+W(B), 


it is often relatively easy to determine inclusion sets for the numerical 
range of a sum. As the example given at the beginning of this chapter 
shows, general results for W(AB) are harder. For example, the mapping 
theorems in the previous sections apparently are not very applicable to the 
multiplicative situation. In the next chapter, we will develop methods that 
are more natural to it. 

In the next section, some results for W(AB) will be obtained for A and 
B commuting selfadjoint or normal operators. In the present section, we 
are able to obtain some information about the spectrum of a product AB 
from the numerical ranges W(A) and W(B). 


Theorem 2.4-1. If0 ¢ W(A), then 


(2.4-1) o(A7'B)c a = (>, A`AEcEW(B), pE way} 


Proof. Since 0 E W(A) and o(A) Cc W(A), we see that A is invertible. 
Now let A € o(A~!B). Hence, 0 € o(A~'B — AI). However, 


A`!B — AI = A`} (B — +A). 
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Hence, if 0 € o(A~!B), then 0 € o(B — AA). Thus 0 € W(B — AA) or 
WB) g 
W(A)’ 

Theorem 2.4-1 has a number of applications. Let us note three of them 
here. 

An operator is called positively stable if all the elements of its spectrum 
are in the right half-plane. In particular, a selfadjoint operator is stable if 


it is positive definite. 


equivalently À € 


Theorem 2.4-2. Let P be positive definite and A stable. Then PA is 
stable. 


Proof. By Theorem 2.4-1, we have 


(2.4-2) o(PA) = o((P-2)-1A] c A. 
W(P-!) 
Since W(P-1) = W(P) ` has only positive real elements and W (A) has 
elements with positive real parts, we conclude that PA is stable. Note that 
P and A are invertible and so is PA. O 

To study another application, recall that any operator T has a polar 
decomposition T = UP. If T is invertible, then U is unitary and P is 
positive definite. A unitary operator U is said to be cramped if its spectrum 
is contained in an arc of the unit circle with central angle less than 7. 


Theorem 2.4-3. If0 ¢ W(T), then the unitary part of T is cramped. 


Proof. Let T = UP. 0 ¢ W(T) implies that the convex set W (T) is 
contained in a sector 


(2.4-3) S = {re® :r >0, 0, <0 < 43}, 


where the difference 02 — 6) < 7. 
So, U = TP™'!, and since 0 ¢ W(T~!), using Theorem 2.4-1, we have 


W(P- 
WT- 


Notice that W(P~") is a closed interval on R. If z € W(T~!), we have 
z = (T~'z,z) for some z, |z| = 1. So z = (y,Ty), where y = Tx 

W(P!) .: 
> W(T-") 
6, < 0 < b2. So o(U) is contained in the set {e : 6, < 0 < 62}. O 

In the case of unitary A and B we can determine W(AB) exactly as 
follows. The assumption is that 0 € W(A)M W(B), which is equivalent to 
assuming that one of the operators has a spectrum of arc length less than 
mT, i.e., that one of the operators has cramped spectrum. 


o(U) c 


s contained in the sector 


and arg (y, Ty) = —arg (Ty, y}. Hence 
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Theorem 2.4-4. Let A and B be unitary operators on a Hilbert space 
such that o(A) C arcT4 and o(B) C arcI’g, where either arc length 
ra <7 or arc length pg < 7. Then 


(2.4-4) W(AB) C cl conv hull (T AL B). 


Proof. Here [,4I'g denotes the set of points z4zg, za € Ta, zB ELB. 
We need consider only the case arc length [4 < m because A and B are 
interchangeable, due to the fact that TAr g =TpBrľ a. 

Since A is unitary, we have ve WA) C cl conv hull (T4); thus arc length 
Ta <r => 0 ¢W(A) >0¢ W(A-!). By Theorem 2.4-1, on the spectrum 
of a product we have o(AB) C TG \/W(A*) CS, where S denotes the 
sector a < 6 < G, where a = inf{6, + 02 | e® E T4, e? € I's} and where 
GB is the supremum of the same set. Since AB is unitary, its numerical 
range closure is the convex hull of its spectrum, and hence 


W(AB) C cl conv hul (Targ). O 


Notes and References for Section 2.4 


Theorem 2.4-1 was shown by 

J. P. Williams (1967). “Spectra of Products and Numerical Ranges,” J. 
Math. Anal. Appl. 17, 214-220. 

Earlier versions for matrices were given by 

H. Wielandt (1951). National Bureau of Standards Report 1367, Washing- 
ton, D.C. 

Versions of Theorems 2.4-3 and 2.4-4 were given by 

S. Berberian (1964). “The Numerical Range of a Normal Operator,” Duke 
Math. J. 31, 479-483. 


2.5 Commuting Operators 


In this section, we will present some results concerning W(AB) and w(AB) 
when AB = BA. As the example at the beginning of this chapter shows, 
not much can be expected generally of the numerical range W(AB). How- 
ever, in some special instances, one can say something such as the following, 
for example, 


Theorem 2.5-1. Let A be a nonnegative, selfadjoint operator and AB = 
BA. Then W(AB) c W(A)W(B). 


Proof. (ABz,x) = (BA'/2z, A‘/2z), where A!/? is the nonnegative square 
root of A. Thus (ABz,z) = (Bg,g)||A/22||?_ = (Bg,g)(Az,x), where 
g= gin j with Al?@7 40. O 
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It turns out that more can be expected for the numerical radius w( AB), 
in view of the power inequality, Theorem 2.1-1. 

First, let us note some readily available facts. Simple examples show that 
w(AB) can exceed the product w(A)w(B). These examples show that this 
can be the case even when A and B commute. For example, take the right 
shift 


0 0 0 0 
1 000 
A=10 10 0 
0 0 1 0 


Then by (1.3-4), we know that w(A) = cos(7/5) = 0.80901699. On the 
other hand, straightforward computation (similar to that of the first exam- 
ple in the book) shows that w(A?) = w(A?) = 0.5, so that 


0.5 = w(A- A?) > w(A) - w(A?) = 0.4045085. 


In the affirmative direction, we easily have the following results. 


Theorem 2.5-2. It is always the case that 

w( AB) < 4w(A)w(B). 
When AB = BA, it always holds that 

w(AB) < 2w(A)w(B). 


Proof. From norm equivalence, Theorem 1.3-1, we know that 
w(AB) < ||All||Bl] S 4w(A)w(B). 


In the commuting case, we may assume w(A) = w(B) = 1 and show that 
w( AB) < 2. Because of Theorem 1.3-1, the numerical radius is a norm. 
Hence, by the triangle inequality, the power inequality theorem (Theorem 
2.1-1), and the subadditivity of w, we have 


w(AB) =w ( FUA + B)? - (4 - B?) 


< Tw((4 + B}?) + w((A- B) 
(2.5-1) < “ilal A + B))? + (w(A — B))?] 

< F [(w(4) + w(B))* + (w(A) + w(B))*] 

=2 0 


Next, we turn to the case in which A and B double commute: AB = BA 
and AB* = B*A. Note that by taking adjoints, B and A also double com- 
mute. Under this assumption, one can get closer to the elusive w( AB) < 
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w(A)w(B) situation, namely, one can show that w(AB) < w(A)||Bl|. To 
prove this, we first arrange two lemmas. 


Lemma 2.5-3. Let A be a unitary operator that commutes with another 
operator B. Then w(AB) < w(B). 


Proof. Assuming that w(B) = 1, we have 
(I-2B)f,f) 20,  k|<1. 
In particular, 
(I — e B)f,f} 20 for 6 € [0,2z]. 


Therefore, 


Qn 
(2.5-2) J (1 — eB) f, fydB > 0, 


where {Fg} is the spectral family for A over the segment 0 < 0 < 27. The 
family Eg can be approximated by a sequence of polynomials in A and A*. 
Since A and A* commute with B and the integral, and the inner products 
are continuous with respect to their arguments, we have 


(2.5-3) 0< l (I —eB)f, f)dEp = ((I- AB) f, f). O 


Lemma 2.5-4. Let A be an isometry and AB = BA. Then w(AB) < 
w(B). 


Proof. A*A = I and hence (ABf, f) = (A*AABf, f) = (ABAf, Af). So 
we need to consider only the restrictions to R(A), which is closed. Notice 
that on the range of A, A is unitary, because A*A = I and for any f = 
Ag € R(A), we have AA*f = AA*Ag = Ag = f. So AA* = I on R(A). 
Since A is unitary on R(A), by Lemma 2.5-3 we have w( AB) < w(B) on 
R(A). O 


Theorem 2.5-5 (Double commute). If the operators A and B double 
commute, then w(AB) < w(B)|| A||. 


Proof. Without loss of generality, we may scale A so that ||Al| < 1. 
Then, it is enough to show that w(AB) < w(B). Let D = (I — A*A)}/? 
be the positive square root, and consider the two operators S and T on 
K=H@®HOH®::: defined by | 


S(fi, fa, fz,---) = (Afi, Dfi, fo, fz,---), 


(2.5-4) B 41 
T (fi, fe, f3,---) = (Bhi, DBD f2, DBD fz,...). 


2.5 Commuting Operators 39 


Then S and T commute and S is an isometry, the former because D7! 
commutes with B, the latter from using D? = I — A*A in 


ISF = AAI? + IDAN + fll — All? = If ll. 


By Lemma 2.5-4, we therefore have wx(ST) < wx(T) for the dilations 
S and T of A and B. Since the numerical range of a direct sum op- 
erator is the convex hull of the summand’s numerical ranges, we have 
w(AB) < wx(ST), ST being a dilation of AB. Moreover, wx(T) = 
max{w(B),w(DBD~')}=w(B). O 


Corollary 2.5-6. Let A be a normal operator commuting with B. Then 
w( AB) < w(A)w(B). 


Proof. By Fuglede’s theorem (see [H]), A commutes with B and B*, and 
|| A|| = w(A) when A is normal. O 

The following result covers some operators that are not normal but are 
square roots of a scalar; for example, the matrix 


0 a 

b OT. 
Theorem 2.5-7. Let AB = BA and A? = al, where a € C. Then 
w(AB) < w(B)||All. 


Proof. Let H, be the Hilbert space H renormed by (f,g), = (Df, Dg), 
where 
D? = I + A*A + A? AP +... 4 (AA, 


where we assume || A|| < 1. Let K = H19 Hı ®--- , and define the operators 
S and T on K by 


(2.5-5) S(fi, f2...) = (Afi, D7! fi, fo, fz,---) 
and 
(2.5-6) T(fi, fe,---) =(Bfi, DBD f2, D"'BDfs,...). 


Notice that ST = TS and wx(T) = w(B), so if f = (fi, fo,...), we have 
(ST f, fx = (D? ABfy, fi) + (Bhi, Dfe) + (BD fe, Dfs) +. 
Choosing D fz = (I — A* A + (A*)?A? — (A*)3 A3 + - -- ) fı, we observe that 

([I + A*A + (A*)?A? +- --J]ABfifi) 
< (I+ (A*A + (A*)P?A? +-+] fi, fa). 
Since A? = al, we have 


([1 + lel? + jeff +---JABSi, fi) < ([1 + lel? + lelt +f fi). O 


(2.5-7) 
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Notes and References for Section 2.5 


In the commuting case it was observed by 

K. Gustafson (1968). “The Angle of an Operator and Positive Operator 
Products,” Bull. Amer. Math. Soc. 74, 488-492. 

that for positive selfadjoint A and bounded accretive B, BA is accretive. 
Later this was extended to W(AB) c W(A)W (B) for positive selfadjoint 
A and bounded commuting A and B, in 

R. Bouldin (1970). “The Numerical Range of a Product,” J. Math. Anal. 
Appl. 32, 459-467. 

These commuting results are rather immediate by use of A?. 

Theorem 2.5-2 for commuting A and B was observed in 

J. Holbrook (1969). “Multiplicative Properties of the Numerical Radius in 
Operator Theory,” J. Reine Angew. Math. 237, 166-174. 
This was the most important paper for these commuting operator numerical 
radius questions. In particular, not only the first but three proofs of the 
double commute result (Theorem 2.5-5) were given there. Two of these 
proofs used dilation theory (see the next section). 

Theorem 2.5-5 was extended by 

R. Bouldin (1971). “The Numerical Range of a Product, II,” J. Math. 
Anal. Applic. 33, 212-219. 
Lemma 2.5-4 is obtained there as well, its proof depending on the dilation 
theory. Our proof of Lemma 2.5-4 does not need the dilation methods. 
However, our proof of Theorem 2.5-5 is essentially that of the just cited 
paper. 

Historically, Corollary 2.5-6 preceded Theorem 2.5-5 and is easily shown 
by use of the spectral theorem for normal operators. More generally, 
we may say from Theorem 2.5-5 that the sought inequality w(AB) < 
w(A)w(B) holds for all double commuting A and B whenever A is also 
normaloid: w( A) = || A||. Normaloid operators will be discussed further in 
Chapter 6. 

Theorem 2.5-7 was given by 
D. K. Rao (1994). “Rango Numerico de Operadores Conmutativos,” Re- 
vista Colombiana de Matematicas 27, 231-233. 

The question of whether, when A and B commute, 


(2.5-8) w(AB) < w(A)||B| 


was open for about twenty years. Its falsehood was finally resolved by 
counterexample in 

V. Müller (1988). “The Numerical Radius of a Commuting Product,” 
Michigan Math. J. 35, 255-260. 

Müller’s approach was computational, and the counterexample was found 
in a 12-dimensional Hilbert space. The counterexample can even be con- 
structed with B a polynomial in A. 
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The related question of the best constants for the inequality 
(2.5-9) w(AB) < Cw(A)||B| 


for commuting A and B has also been considered; see 
K. Okubo and T. Ando (1976). “Operator Radii of Commuting Products,” 
Proc. Amer. Math. Soc. 56, 203-210. 
K. Davidson and J. Holbrook (1988). “Numerical Radii of Zero-One Ma- 
trices,” Michigan Math. J. 35, 261-267. 

The last cited paper also gives a simple counterexample to the inequality 
(2.5-8). Let 


0 1 0 
S = Sg = 
1 
0 0 
be the left shift in C?, and let 
00010 0 0 1 0 
1 1 
1 0 
1 0 
T = S + S = 1 0l, 
l 1 
0 
0 
0 
so that 
0000100 0 1 
1 0 
1 0 
1 0 
(2.5-10) TS = S + SÈ = 1 
0 
0 
0 


0 


Clearly, TS = ST. By (1.3-4) we know that w(S) = cos(7/10) = 0.95105652. 
It can be shown (see Section 5.3) that w(T) has the same value, but 
w(TS) = 1. Since ||S|| = 1, the constant in (2.5-9) for this example is 
c = 1.0514622. 
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2.6 Dilation Theory 


Because the dilation theory of Sz.-Nagy, Foias, Berger, Halmos, Ando, and 
others has impacted the numerical range theory, we want to explain some of 
that theory briefly in this section. Recall that some of the proofs in Section 
2.1 and Section 2.5 employed dilations. There are many dilation theories 
(see the Notes). Here we will describe only those based on extending the 
Hilbert space by direct sums. 

Let us consider the matrix case first. For n x n matrices A and B, the 
direct sum is the 2n x 2n matrix 
0 B 


AoB=|4 p 


on H @ H. By the direct sum property (easily verified; see [H]), 
(2.6-1) W(A® B) = conv hull(W(A)UW(B)), 


one immediately sees many opportunities for constructing and manipulat- 
ing numerical ranges of A extended to direct sums, finite or infinite. 
Matrix dilations are a generalization of direct sums and, in connection 
with numerical ranges, open up even further opportunities for numerical 
range constructions. For any n x m matrix, any strictly larger matrix 
~ A B 
a= [6 D] 


is called a dilation of A to a larger space. For example, for any n x n A, 
one can always get a small (2n x 2n) normal dilation; for example, 


(2.6-2) A= K ay. 


By the submatrix inclusion property (this is easily verified; see Theorem 
5.1-2 or [M]), W(A) c W(A) for any matrix dilation A. It is relatively 
easy to show (see, e.g., [HJ2]) that 


(2.6-3) W(A) =()W(A) 


over all 2n x 2n normal dilations of A. 

Hilbert space dilations and their use in numerical range theory generalize 
these ideas. Any extension T of an operator T to a Hilbert space K strictly 
containing H such that 


(2.6-4) T=PyT on H, 


where P;; denotes the orthogonal projection of K onto H, is called a dila- 
tion of T. It becomes a strong dilation if 


(2.6-5) T” =PyT” on H 
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for all positive integers n. If T is unitary and 
(2.6-6) T” =pPyT” on H 


it is called a strong p unitary dilation. 

For arbitrary contractions (||T|| < 1), and all bounded operators T can 
be made contractions by normalizing them, one may get a rather small 
(H @ H) unitary dilation of T, namely 


~ T (I — TT*)2 
(2.6-7) T — (I — T*T)? —T* | ? 


where the off-diagonal operators are the unique selfadjoint positive semi- 
definite square roots guaranteed by the nonnegativity 


(2.6-8) (I - TT*)z, x) = |æ? — ||T*2l|° 2 0. 


The dilation (2.6-7) turns out to be useful, even though it does not by itself 
have the operator power properties desired of a strong dilation. To note 
the latter, take the trivial case of T selfadjoint; then T? = I! The key to 
the unitarity of the dilation (2.6-7) is the block intertwining property 


(2.6-9) (I -TT*)?T =T(I —T*T)?, 
which follows from the identity T(J — T*T) = (J — TT*)T. 


A main result in the Hilbert space dilation theory is the following. 


Theorem 2.6-1. For every contraction T on a Hilbert space, there exists 
a unitary dilation U on some Hilbert space K containing H as a subspace 
such that 


T”x = PyU"z, n=0,1,2,3,... 
One can require that U be minimal, in the (cyclic) sense that 
|) u(x) =K. 


U is then determined uniquely, up to isometric isomorphism. 


Proof. Several proofs have been given (see the Notes). The first started 
with the fact that T contractive implies the positivity 


(2.6-10) Re > 0 


(x, x) +2 ` A"(T" 2x, x) 
n=l 


for all |A| < 1. From this the Stieltjes integral representation of analytic 
functions with nonnegative real part in the unit circle leads to the repre- 
sentation 


20 
(2.6-11) r= | e? duo, 
0 
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where jg is a (obtained as a boundary value limit) measure on [0,27]. By 
a result of Neumark, this measure may be “dilated” to a unitary spectral 
measure E, i.e., u(B)x = Py E(B) for all Borel sets B, and from this one 
takes the unitary dilation of T to be U = fe" e*°dE 9. Later proofs utilized 
a construction like (2.6-7) placed within a setting of an infinite direct sum of 
copies of H and coupled to some shifts therein, similar to the constructions 
used in Section 2.5. O 

The p-dilations are the most interesting ones for use in connection with 
the numerical range W(T). To motivate this, note that the rather easy 
matrix normal dilation (Eq. (2.6-2)) has square 


(Ay? _ A? + (At)? AA*+ A*A 
A*A + AA* (A*)? + A? ’ 

and in the trivial case where A is symmetric, we see that A? = } Py(P)? 
on H. The idea is to go the other way, to push p to as high a value as 
possible. This is the meaning of the dilation used in the proof of the power 
inequality in Section 2.1, that from numerical radius w(T) < 1 we are 
guaranteed a unitary dilation T of T such that T? = 2PyT?. In other 
words, we want the dilation to have good numerical range and numerical 
radius properties and still, when projected, we want it to “shrink” T and 
its powers T” as much as possible. It is easy to see from the basic relation 
w(T) < ||T|| < 2w(T) that 2 is the sharpest p that can be expected. For 
the power inequality, the strong dilation must achieve 2 exactly. 


Notes and References for Section 2.6 


The first step in the dilation theory seems to have been that of 
P. Halmos (1950). “Normal Dilations and Extensions of Operators,” Summa 
Brasil. Math. 2, 125-134. 
where the dilation (2.6-7) was given. Theorem 2.6-1 was proved by 
B. Sz.-Nagy (1953). “Sur les Contractions de l’espace de Hilbert,” Acta 
Sci. Math. 15, 87-92. 
This was the first general operator strong dilation result and had a sub- 
stantial impact on the harmonic analysis of operators by use of extensions 
of them to larger Hilbert spaces. The term “harmonic analysis” is appro- 
priate in terms of the proof of Theorem 2.6-1 we sketched, as (2.6-11) may 
be regarded as a “Fourier coefficient” of the measure pg on the unit circle. 
The full theory of such dilations may be found in the book 
B. Sz.-Nagy and C. Foias (1970). Harmonic Analysis of Operators on 
Hilbert Space, North Holland. 

At the end of the proof of Theorem 2.6-1 we mentioned a later proof that 
brings in Halmos’ dilation construction (2.6-7). This was given by 
J. J. Schaffer (1955). “On Unitary Dilations of Contractions,” Proc. Amer. 
Math. Soc. 6, 322. 
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Schaffer’s proof is noteworthy for its brevity, but carries the advantage of 
directly exhibiting the strong dilation. 

Next came a question of whether two commuting contractions T; and To 
could have commuting strong unitary dilations. This was proved by 
T. Ando (1961). “On a Pair of Commutative Contractions,” Acta Sci. 
Math. 24, 88-90. 

The result does not extend to three commuting operators. 

It is known (generalized Wold decomposition; see, e.g., [H]) that every 
contraction T may be uniquely decomposed as the direct sum of a unitary 
part and a completely nonunitary part. The latter has a minimal dilation, 
which is the skew sum of two bilateral shifts. There are connections be- 
tween this theory and the incoming/outgoing subspaces decompositions of 
scattering theory, see 
P. Lax and R. S. Phillips (1967). Scattering Theory, Academic Press, New 
York. 

Indeed, group representation theory is an earlier “dilation” theory, but the 
purposes and results are quite different. 

Just as parts of operator theory may be somewhat irreverently viewed 
as extending complex numbers z = er to operators T = U IT|, both 
written here in their polar forms, where |T| = (T*T)!/? and U is the 
partial isometry mapping the range of |T| to the range of T, some of the 
dilation theory, especially that originated by Sz.-Nagy and Foias, has some 
motivation in the Hardy space theory of decompositions of a function h 
(in applications, in the frequency domain) as h = io, where 7 is an inner 
function and o is an outer function. Remember that inner functions may 
be thought of as Blaschke products, which absorb all upper half-plane zeros 
from h. For our own version of this, see 
R. Goodrich and K. Gustafson (1981). “Weighted Trigonometric Ap- 
proximation and Inner—Outer Functions on Higher Dimensional Euclidean 
Spaces,” J. Approx. Theory 31, 368-382. 

R. Goodrich and K. Gustafson (1986). “Spectral Approximation,” J. Ap- 
prox. Theory 48, 272-293. 

One serious deficiency of the Hilbert space operator dilation theories we 
have described is that in certain concrete applications in specific function 
space settings, they will not carry into the dilations needed lattice proper- 
ties (e.g., preservation of positivity of functions representing probabilities). 
To that end, another kind of dilation theory was started by 
M. Akcoglu (1975). “Positive Contractions on L,-Spaces,” Math. Zeit. 
143, 1-13. 

For a good treatment of that dilation theory, see 

M. Kern, R. Nagel, and G. Palm (1977). “Dilations of Positive Operators: 
Construction and Ergodic Theory,” Math. Zeit. 156, 265-267. 

In a recent paper, we employ the Akcoglu dilation theory 

I. Antoniou and K. Gustafson (1996). “Dilation of Markov Processes to 
Dynamical Systems,” preprint. 
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to answer a question of dilating a probabilistic Markov semigroup M; to a 
deterministic evolution in a larger space. 

Although the Sz.-Nagy-—Foias et al. Hilbert space operator dilates a 
continuous positive family W; of operators to a unitary family U, on a 
larger space, and although the dilations there can always in fact be given 
a positivity structure, we did not see how to do that in such a way as to 
accommodate also a meaningful extension of the underlying phase space 
dynamics. The Akcoglu dilation construction often can be shown to pre- 
serve phase space dynamical characteristics, such as ergodicity, and in our 
case was useful to distinguish physical reversibility and irreversibility. 

A third dilation theory is that of Kolmogorov—Rokhlin; see 
I. P. Cornfeld, S. V. Fomin, and Ya G. Sinai (1982). Ergodic Theory, 
Springer, New York. 

V. Rokhlin (1964). “Exact Endomorphisms of a Lebesgue Space,” Amer. 
Math. Soc. Transl. 39, 1-36. 

In this theory, one is able to construct “natural extensions” to what are 
called exact dynamical systems, which are roughly those that preserve a 
measure on the phase space dynamics. We employed this dilation theory 
to show that certain probabilistic dynamical systems may be embedded in 
a larger deterministic dynamics of what are called Kolmogorov systems, in 
I. Antoniou and K. Gustafson (1993). “From Probabilistic Descriptions to 
Deterministic Dynamics,” Physica A 197, 153-166. 

One may usefully relate these three dilation theories intuitively as follows. 
Picture a trajectory dynamics S; on subsets 8 of a phase space 2, and 
consider the £?(Q, 8, u) space of functions f (e.g., probability densities) 
defined over Q. Then the Sz.-Nagy—Foias et al. theory dilates the functions 
f; the Kolmogorov—Rokhlin et al. theory dilates the trajectory dynamics 
Sz; and the Akcoglu-Sucheston et al. theory dilates the measure u. All 
induce dilations of the corresponding operator semigroups W+. 


Endnotes for Chapter 2 


One remarkable fact about mapping theorems for the numerical range is 
that there are at least some of them. For example, the simpler question 
of just polynomial mapping theorems just for the numerical radius of just 
2 x 2 matrices with real determinant and real trace shows that even the 
2 x 2 case can be quite complicated; see 

C. R. Johnson, I. M. Spitkovsky, and S. Gottlieb (1994). “Inequalities 
Involving the Numerical Radius,” Linear and Multilinear Algebra 37, 13- 
24. 

As mentioned there, even the question of when w(f(A)g(A)) < w(f(A)) 
w(g(A)) holds for A nonnormal, even for simple real polynomials such as 
f(z) = z and g(z) = z’, is unresolved. The case of general matrix products 
can therefore be expected to be much more difficult. 
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A natural approach to some of these questions, as we have seen, is 
through dilations. One might hope in the commuting case that the di- 
lations of S, ||S|| < 1, and T, with w(T) < 1, could be combined. However, 
it was shown by Davidson and Holbrook (see Section 2.5; Notes and Ref- 
erences) that for dilation constant p > 1 there are commuting operators S 
and T such that w,(ST) > w,(T)||S||. Here the (homogeneous) operator 
radii w,(T’) are defined by w,(T) < 1 if and only if T € C,, the Sz.-Nagy- 
Foias class of operators having strong p unitary dilation; alternately 


w,(T) = inf{y|y>0, yT E€ Cp}. 


These interesting operator radii contain ||T|| = w,(T), w(T) = we(T), and 
r(T) = limp— w,(T). See 

J. A. R. Holbrook (1968). “On the power-bounded operators of Sz.-Nagy 
and Foias,” Acta Sci. Math. 29, 299-310. 

K. Okubo and T. Ando (1976). “Operator radii of commuting products,” 
Proc. Amer. Math. Soc. 56, 203-210. 

The only pair of dilations that can be combined (because the unitary parts 
commute) are the 1-1 Ando dilations (see Section 2.6; Notes and Refer- 
ences). 

It would be desirable to have more W(A) mapping theorems. Even 
“semiglobal” results for sectors and half-planes for certain classes of oper- 
ators A, and mappings ranging from “easy” such as polynomials to “aca- 
demic” such as the important transcendental functions, should be better 
understood. 

Such mapping theorems could be sought as motivated by very specific 
applications or situations. For example, it was shown by 
K. Gustafson (1969). “Polynomials of Infinitesimal Generators of Contrac- 
tion Semigroups,” Notices Amer. Math. Soc. 16, 767, 
see also 
K. Gustafson and G. Lumer (1972). “Multiplicative Perturbation of Semi- 
group Generators,” Pacific J. Math. 41, 731-742, 
that if A is a (generally, bounded or unbounded) contraction semigroup 
generator and p(z) is a polynomial such that there exists a zo, Re zo > 0, 
such that the zeros À; of zp — p(z) all lie in the resolvent set p(A), then 
p(A) will also be a contraction semigroup generator iff W(p(A)) is contained 
within the left half-plane Rez < 0. Contraction semigroup generators will 
be discussed further in the next chapter. The point raised here is that 
it would be useful to have more understanding of sectorial or half-plane 
numerical range mapping theorems for particular operator classes. 

The potential subtlety of numerical range mapping properties appears 
also in related questions of monotonicity of functions of matrices. For 
example, consider two symmetric positive semi-definite matrices A and B 
such that A 2 B, i.e., A — B 2 0. Examples may be easily constructed to 
show that A? — B? need not be positive. However, the fundamental result 
that A? 2 BP for all 0 < p < 1 was established by 
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K. Léwner (1934). “Uber Monotone Matrixfunktionen,” Math. Z. 38, 
177-216. 

The proof requires analytic continuation into the upper half-plane. Sim- 
pler proofs have been given for the case p = 1/2. For related results see 
E. Heinz (1951). “Beiträge zur Storungstheorie der Spektralzerlegung,” 
Math. Ann. 123, 415-438. 

C. Davis (1963). “Notions Generalizing Convexity for Function Defined 
on Spaces of Matrices,” in Convexity (V. Klee, ed.), Amer. Math. Soc., 
187-201. 

K. Bhagwat and A. Subramanian (1978). “Inequalities Between Means of 
Positive Operators,” Math Proc Camb. Phil. Soc. 83, 393-401. 

Recently, Lowners power inequality has been generalized by 
T. Furuta (1987). “A > B > 0 assure (B" A? B")!/4 > BiPt?")/4 for r > 0, 
p >0,q > 1 with (1+ 2r)q > p+2r,” Proc. Amer. Math. Soc. 101, 
85-88. 

T. Furuta (1995). “Extension of the Furuta Inequality and Ando-Hiai 
Log-Majorization,” Lin. Alg. Appl. 219, 139-155. 
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Operator Trigonometry 


Introduction 


The concept of the angle of an operator T was introduced in 1967 for use 
in a perturbation theory of semigroup generators. From this has developed 
what we call an operator trigonometry, whose theory and applications are 
still evolving. Its properties are intimately associated with the numerical 
range W(T) and are described in this chapter. Some applications will be 
described in the next chapter. 

Because we have played a major role in the creation of this theory, and 
because no comprehensive account of it occurs anywhere else in book form, 
we will attempt to present all the main currents of its development to date. 
In the endnotes section we will give a rather complete set of references, 
with some historical comment. 


3.1 Operator Angles 


The cosine of T in B(H) was originally defined as follows: 


l Re (Tz, x) 
3.1-1 cosT = inf —-———,, £20; leo 
P seb(r) [Ta e 7° T7 
for arbitrary operators in a Banach semi-innerproduct space. We will re- 
strict attention here primarily to the case of T € B(H). Clearly, (3.1-1) 
is a real cosine defined for the real part of the numerical range of T. The 


total cosine 


|cos|T = inf LEE 


ir aio Z770, Tr#0, 
zeH ||Tx, x|| - ||z| 


and imaginary cosines are similarly defined. Most of our own interest 
originally centered on semigroup theory and in particular on accretivity 
(Re (Tx,z) 2 0) preserving questions. Thus to some extent we will retain 
that point of view here. Comparable half-space preserving results may be 
obtained by easy modification. 


K. E. Gustafson et al., Numerical Range 
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The expression (3.1-1) defines the angle ¢(T). The angle ¢(T) mea- 
sures the maximum (real) turning effect of T. Later we will consider 
some operator subangles ¢;(T) that measure smaller critical turning ef- 
fects of the operator T. Mixed trigonometries for, say, two operators 
A and B can be built from ¢(A) and ¢(B8). Further, one could investi- 
gate trigonometries of algebras and other more general structures. Clearly, 
p(T) = o(T~') = (cT) = ¢(cT~') for any real scalar multiples c # 0. 
The reader should note that cos T here is a geometrical notion, and not the 
entity cos(T) defined as a power series in a functional calculus of T. 

A general bound for cosT' is available by elementary considerations. 


Theorem 3.1-1. For any operator T, its cosine is bounded by the upper 
and lower numerical radii m(T) and w(T): 


m(T) w(P) 
(3.1-2) IT] < | cos |T < ITI 


Proof. Both bounds follow from the elementary fact that for positive 
quantities an, bn, one has 


inf(a,,) inf(b,) < inf(anbn) < sup(a,) inf (bn). 


The same bounds hold for (real) cosT. The upper bound follows from 


Re (Tz, x) , 1 
inf —————. < sup Re(Tz,z)- inf ——, 
ter re S a ee ates T] 


and the lower bound from two infs. O 

These bounds are generally not very sharp, although for operators far 
from normality, they may indeed be sharp. For the important class of 
selfadjoint operators, the cosine is known. 


Theorem 3.1-2. For T a strongly positive (m > 0) selfadjoint operator, 


(3.1-3) cos T = 


where m = m(T) and M = w(T). 


Proof. This is most easily seen (in retrospect; see the Notes) by use of the 
Kantorovich inequality: 


(3.1-4) max ax {(y, Ty) (u T y)} =; (VŽ rya J 


First, we change the cos T minimization to a maximization: 


(Tre) N? L faa ITEN _ a, Halle? 
Gace i) -( - Taa) e (Tea) 
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In the maximization, we then replace z with z = (Tz, z)" 27 so that 
ax IZel? |||? 21) 12 
— r = T 
ax he, aye T e, Tx!" |x| 
= max (TV/2z,73/22)\(TV2¢,T- 1/22). 


(Tx,xr)=1 


Then, we let y = T!/%x and require that ||y||? = (Tzx,x) = 1 so that the 
maximization just written is that of Kantorovich: 


_l M m M’ +m? + 2mM 
T- 2 —— 


Thus, cos T is the inverse square root of this quantity: 


2V mM 
M+m 


Although we are chiefly interested in bounded operators B(H) in this 
book, it is worth noting that for unbounded operators in a Hilbert space, 
cosT is always 0. This fact geometrically distinguishes the topological 
notions of boundedness and unboundedness for accretive operators. In 
other words, an unbounded operator has no restriction on its turning rates 
and may achieve turning angles arbitrarily close to 90° regardless of its 
particular operator structure. 


cos T = 


Theorem 3.1-3. If A is an accretive unbounded operator in a Hilbert 
space, then cos A = 0. 


Proof. If Re (Az, zx) is bounded above uniformly, cos A = 0 immediately 
by the unboundedness of A; therefore, we may assume that there exists 
a sequence {un}, ||un|| = 1, Re (Aun, Un) — oo. Let Wn = MmUn + EnUn, 
where mn = [Re (Aun, Un)}~%, En = (1—72)!/7, § Sa < 1, and vn € D(A), 
\|Un|| = 1; un will be specifically chosen later. Then for all sufficiently large 
n, if || Avn || is uniformly bounded, one has by the inverse triangle inequality: 


(3.1-5) 
_ Re(Awn, Wn) 
em) = Taw] Iva 
< E? Re (Aun ,Un) +€ntin Re (Aun tn) +1n€n Re (Aun Un) +n? Re (Aun ,un) 
[Mn || Aun l|-En || Avn ||] |En =| 

(N+ No +.N3 + Na) 
= P , 
with the denominator D — ov, since |En — m| — 1, and || Aun l| - [Re (Aun, 
Un)|_* — œ for 0 < a < 1; the latter may be seen as follows. Let 
l|un|| = 1, aœ < 1, and Re (Aun, Un} > co. Then, by Schwarz’s inequality, 
one has ||A — un||? - [Re (Aunun)|~2% 2 [Re (Aunun)]2~2% — oo. 

Let us now consider the four terms N;/D separately. If ||Av,|| is uni- 
formly bounded, clearly (by Schwarz’s inequality) N,/D — 0 and N2 — 0. 
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Also, N4 = 1 if a = 1/2, N4— O ifa > 1/2; thus (Ny +N2 + N4) D7! — 0 
for 1/2 S a < 1. Therefore, if |N3| is uniformly bounded, R(w,) — 0 in 
(3.1-5). Now, if there exists at least one nontrivial vector v € D(A)ND(A*) 
(let it have norm = 1), then taking vn = v, ||Av,|| is obviously uniformly 
bounded, and |N3| = Mnn |Re (un, A*Un)| S || A*v||. If D(A)ND(A*) = {0}, 
we may proceed as follows. Select x,y € D(A), ||x|| = ||y|| = 1, (£, y) = 0, 
and let vn = OnT + Bry, lanl? + |Bn|? = 1. Now choose ay, Bn so that 
(Un, Aun) = 0; that this can always be done is assured by taking a, and 
Bn from the solutions of the equation a,(z, Aun) + Bn(y, Aun) = 0. Then 
leni = 1, |Avnll S [Axl] + |Ayl], Ns = 0, and R(wn) > 0. O 

For bounded strongly accretive operators T (Re (T'z,x) 2 mr > 0 for 
||z|| = 1), we know by Theorem 3.1-1 that the angle ¢(T) is less than 90°. 

Example. Consider the 2 x 2 matrix 


Sii 


We have cos A = 2\/2/3 & .94281 and A thus has angle ¢(A) & 19.471 
degrees. 


Notes and References for Section 3.1 


The angle of an operator was introduced in 
K. Gustafson (1968a). “The Angle of an Operator and Positive Operator 
Products,” Bull. Amer. Math. Soc. 74, 488-492. 
in connection with problems in the perturbation theory of semigroup gen- 
erators, which were treated by 
K. Gustafson (1968b). “Positive (Noncommuting) Operator Products and 
Semigroups, Math. Zeit. 105, 160-172. 
K. Gustafson (1968c). “A Note on Left Multiplication of Semigroup Gen- 
erators, Pacific J. Math. 24, 463-465. | 
The formula (1968a, Corollary 4.4) for cosT for T selfadjoint was ob- 
tained by the norm convexity techniques of Gustafson (1968b, Lemma 1.3). 
The connection to the Kantorovich inequality was not known at that time. 
In that connection, let us take this opportunity to note a typographical 
error in Gustafson (1968b), which could otherwise lead to confusion when 
reading those earlier papers. The inequality string two-thirds down on p. 
163 appears as 


leB +I < |Z — 7B? || - |leB-II7* < (1 +e? ||BIl?)- [1 — 0(B)]°. 
It should read 
leB + Zl] < |Z — 7B? || - (eB - D)? < 1+ 2 || BI?) - [1 - (By. 


There B is a bounded, dissipative (Re |Bz, x| < 0) operator on a Banach 
space X, «€ > 0, and 6(B) = supRe[Bz,z], ||z|| = 1, is its upper bound. 


3.2 Minmax Equality 93 


This typographical error does not affect the validity of any of the results 
in that paper. 

The fact that cos T' = 0 for T unbounded was shown by 
K. Gustafson and B. Zwahlen (1969). “On the Cosine of Unbounded Op- 
erators,” Acta Sci. Math. 30, 33-34. 
This was somewhat generalized by 
P. Hess (1971). “A Remark on the Cosine of Linear Operators,” Acta Sci. 
Math. 32, 267-269. 
An example of cos T Æ 0 for T unbounded in a Banach space was given in 
K. Gustafson and M. Seddighin (1989). “Antieigenvalue Bounds,” J. Math. 
Anal. Appl. 143, 327-340. 


3.2 Minmax Equality 


The condition that B be strongly accretive, Re(Bz,xr) 2 mg > 0 for 


|z|| = 1, is equivalent to the existence of an interval of € > 0 such that 
eB — I|| < 1. The minimum 
(3.2-1) gm( B) = min ||eB — I|| 

c>0 


was of interest in semigroup perturbation theory, which started this oper- 
ator trigonometry. We will defer an account of that semigroup theory to 
Section 3.4. For that theory, but more importantly for the development of 
the operator trigonometry, the following minmax result is important. 


Theorem 3.2-1 (Minmax equality). For a strongly accretive, bounded 
operator B on a Hilbert space, 


(3.2-2) sup inf ||(eB — I)z||* = inf sup ||(€B — I)z||?. 
EJES S €>0 Heys 


In particular, 
(3.2-3) sin B = gm(B), 
where the sine of B is defined by sin B = V1 — cos? B. 


Proof. Let us first note that the right-hand side of (3.2-2) is just g2 (B). 
We will show later that this minimum is attained uniquely. Also, we note 
that the left-hand side of (3.2-2) is indeed 1 — cos? B. To see this, consider 
the parabola 


(3.2-4) (eB — I)a||? = €?||Ba||? — 2eRe (Bz, x) + 1. 


This parabola achieves its minimum at the value €,,(z) = Re (Bz, z)/||Bz||?, 
and the value of the minimum is 


(3.2-5) 1 — (Re (Bz, x) /|| Bxl)”. 
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The supremum over z, ||z|| = 1, of this quantity is 1 — cos? B. 

Next, let us assure ourselves that the minimum gm (B) is attained uniquely. 
This is the case for uniformly convex Banach spaces but not generally true 
otherwise. Here we give a Hilbert space proof. Let us suppose the con- 
trary, that ||eB — I|| dips down and then has a flat interval minimum. Let 
€ < €2 be the ends of this interval. Then, for every small 6 > 0 there is 
an x, ||z|| = 1, such that the parabola ||(e«B — I)z||? is less than or equal 
to g?,(B) at both €; and e2 but at their midpoint must be within 6 of the 
minimum flat: 


(3.2-6) I(((€1 + €2)/2)B — I)a||° 2 gm(B) — 6. 


But no quadratic function of €, anchored at the value 1 at € = 0, can satisfy 
this condition for arbitrarily small 6. 

Let em denote the unique € > 0 at which g,,(B) is attained. The convex 
curve ||eB — I||? is continuous in e€, it has left- and right-hand derivatives 
for all €, and these are equal except at a countable set of e. Thus, we may 
speak freely of the “slope” of ||eB — I||? for a dense set of € > 0. 

Consider now the curve ||¢B—I||? near its unique minimum value g?,(B). 
For any chosen fixed € just, but strictly, to the left of €m, ||eB —I||* slopes 
downward and is strictly greater than ||emB — I||?. Since ||eB — I||? is a 
supremum, there thus exists an z1, ||z;|| = 1, such that ||(eB — Ijz ||? > 
\|€mB —I||?. Moreover, we may choose this xı so that its minimum €,,(2) 
lies (possibly nonstrictly)to the right of €m, for otherwise all of the parabo- 
las ||(eB — I)x,||? increasing to achieve the supremum ||(eB — J)||? from 
below would turn upward from points €,,(21) strictly to the left of em, cut- 
ting and thereby violating the downward-sloping curve ||¢«B—TJ||?. Similarly, 
there exists an £2, ||x2|| = 1, for any chosen fixed e€ just, but strictly, to the 
right of em, such that ||(eB — I)zre||? > ||emB — I||? and such that €m(xr2) 
lies (possibly nonstrictly) to the left of em. Moreover, we can choose zı and 
T2 so that ||(eB—I)x,||? > g2,(B) —6 and ||(emB—I)zxe||? > g2,(B) —6 for 
any prespecified small 6 > 0, so that those two parabolas pass arbitrarily 
close below the minimum point g?,(B). 

Consider first the case in which, for given small 6 > 0, €m(a1) and €m (x2) 
lie strictly to the right and left of em, respectively. Let x = €z1 +nz2, where 
€ and 7 are real and satisfy 


(3.2-7) 1 = |[x||? = E€ + n? + 2n€Re (z1, £2). 


Using (3.2-7), a simple computation shows that ||(emB — I)z||? > 92,(B) — 
§+2€nC, where C = Re {((emB—I)z1, (€mB—I)x2) —(92,(B)—6)(2r1, 22) }, 
and by restricting € and 7 in (3.2-7) to the appropriate quadrant, we can 
assure that 2€7nC > 0. By choosing € and 7 not only of appropriate sign 
but also near one or the other coordinate axes, we can also assure at this 
point in the construction that this term 2£7C is also arbitrarily small, but 
it turns out better to let this happen automatically as a consequence of 
later steps in the construction. 
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Next, we would like to show that €m(x) can be made arbitrarily close to 
€m. Consider first the case in which we ask for exact equality, ém(x) = €m- 
By a short computation, this can be seen to be equivalent to 


oz © [Reena (1-2) } 49 [Ree (1-2) 


— 2€nRe (B21, (€m B — I)x2) = 0, 


where €, denotes €m(z 1) and €2 denotes €m(r2). Since (1 — €m/ei)(1 — 
Em /€2) < 0, the degenerate hyperbola (3.2-8) and the ellipse (3.2-7) have a 
point in common. This assures that €m(x) = €m and that ||(€mB-—TI)z||? > 
g2,(B) —6 for arbitrarily small 6 > 0. But since €m(Z) = €m, the term 2€nC 
must be small, for otherwise ||(€mB—JI)z\|? would exceed ||¢mB—TI||?, which 
it cannot. Thus these x = z(6) provide the supremum sequence for the left 
side of (3.2-2) to equal the right side, namely, g? (B). 

For those cases in which we cannot ask for €m(z) to be exactly €m in 
the above construction, but only arbitrarily close to €m, we proceed in the 
same way. It is desired that 


(3.2-9) 

€ (x) _ €?Re (Ba1,21)+n?Re (Br2,22)+2€nRe(Bx1,22) _ é 
= — ai Rn. le Ln2lRr.lleLotnrRe (Rr. Rra) 

m E2 || Bx: ||? +7? || Bra||?+2€nRe (Bzı,Bx2) mı 


where ëm is to be arbitrarily close to em. Multiplying this out results in the 
same expression (3.2-8), with €m replaced by ém. Provided that we are not 
in an exceptional instance in which €, = €2, we may ask for ëm arbitrarily 
close to €m strictly between €2 and €; and proceed as before. 

The special cases in which €,,(X1) or €m(Z2) coincides with €m correspond 
to one or both branches of (3.2-8) lying along € or 7 coordinate axes. In 
that case, one chooses € = 0 or 7 = 0 as the case may be; it happens then 
that the 2&nC term is killed exactly. The special case in which zı = q2 
corresponds to a degenerate parabola in (3.2-7). These two parallel lines 
span all four quadrants, so the sign choice on €,7 remains unrestricted. 
Thus, in all cases we have constructed x with €,,(z) arbitrarily close to €m 
and ||(€,,B — I)z||? arbitrarily close below g2,(B). The supremum of such 
x shows that the left side of (3.2-2) attains the right side. OU 

Example. Consider the 2 x 2 matrix 


1 0 
A= [o 2 
considered previously. For positive, selfadjoint operators, one always has 


eA — Z|] = max{1 — em,eM — 1}. 


These two lines intersect at €m = 2/3 to provide g,,(A) = 3. Thus, ac- 


cording to Theorem 3.2-1, one has sin(A) = %. On the other hand, from 
Theorem 3.1-2, we have cos(A) = 2\/2/3. The compatibility of these two 


characterizations is seen in sin? A + cos? A = 1. 
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The minmax result was given in 
K. Gustafson (1968d). “A Min-Max Theorem,” Notices Amer. Math. Soc. 
15, 799. 

The proof given here is that of 
K. Gustafson (1995). “Matrix Trigonometry,” Linear Algebra Appl. 217, 
117-140. 
In the related paper 
E. Asplund and V. Ptak (1971). “A Minimax Inequality for Operators and 
a Related Numerical Range,” Acta Math. 126, 53-62. 
it was shown that a generalization of the min-max result to «A + B holds, 
but only in a Hilbert space. 


3.3 Operator Deviations 


Krein (1969; see Notes) introduced, in a different semigroup context, a 
quantity he called the deviation of T, dev(T). This quantity is equivalent 
to ¢(T). There the following lemma was stated without proof. 


Lemma 3.3-1. Let z,y,z be three unit vectors in a Hilbert space. De- 
fine the angles Ory, byz,¢rz by coSdzy = Re(z,y), cosdyz = Re(y, 2), 
cos rz = Re (x, z), respectively, with 0 S ry, yz, rz S m. Then 


(3.3-1) rz S Pry + Pyz- 


Proof. Let (x,y) = a; + ibi, (y, z} = a2 + ib, (x, z} = ag + ib3, where the 
ai, i = 1,2,3, and b;, i = 1, 2,3, are real and |a,|? + |b;|? < 1 for i = 1,2,3. 
We have cos ry = a1, COS yz = a2, and cos¢zz = a3. Since cosa is a 
decreasing function of a@ in the interval 0 < a < 7, we need to prove only 
that 


COS zz > COS(ry + Pyz) 


az > ajaz — 4/1 — a? 4/1 -— a2, 


where sin gz, = y1 — a? > 0 and sing,, = y1 — aż > 0. Thus, we need 


(3.3-2) \/1 — a? 4/1 — a2 > aa — a3. 


The result is obvious if the expression on the right of (3.3-2) is negative. 
If it is nonnegative, we need to prove that 


(1 — aĵ) (1 — a3) > (aiaz — a3)? 


or equivalently, 
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or 
(3.3-3) 1 — a? — aż — a2 + 2aja2a3 > 0. 


Since the matrix 


(x,x) (x,y) (a, 2) 
(3.3-4) G= | (yt) (yy) (y,2) 
(z£) (z,y) (z,2) 


is positive semidefinite, as is its entrywise complex conjugate, the matrix 


1 Qa, a3 
(3.3-5) ai 1 a2 
a3 a2 1 


is positive semidefinite, so its determinant is nonnegative, yielding (3.3- 
3). O 
One now defines 


dev (T) = sup ¢(Tz, zx), 
ze 


where $(Tz,2),0 < ¢ < v, is defined by the equation 

_ Re(Tz,z) 

-Zall ell 

Notice that dev (T) is thus the same as the operator angle ¢(T) of Section 


3.1. From the triangle inequality of Lemma 3.3-1, one then obtains the 
following. 


cos(¢(Tz, x)) 


Theorem 3.3-2. Let A and B be bounded invertible operators in a Hilbert 
space. Then 


(3.3-6) (AB) < (A) + (B). 


Proof. From Lemma 3.3-1, we have 
¢(ABz, x) S ¢(ABz, A~'x) + ¢(A7'z, x) 
= ġ(Bx, x) + ¢(Az,z). 
Their suprema bear the same relation, so one has 
dev (AB) < dev (B) + dev (A). O 


This fact, that the maximum turning angle of AB is less than or equal to 
that of A followed by that of B, is of course geometrically evident. 


Notes and References for Section 3.3 


Lemma 3.3-1 and Theorem 3.3-2 are due to 
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M. Krein (1969). “Angular Localization of the Spectrum of a Multiplicative 
Integral in a Hilbert Space,” Functional Anal. Appl. 3, 89—90. 

The proof of Lemma 3.3-1 was apparently first given by 
D. Rao (1972). Numerical Range and Positivity of Operator Products, 
Ph.D. thesis, University of Colorado, Boulder, Colorado. 

See also 

D. Rao (1976). “A Triangle Inequality for Angles in a Hilbert Space,” 
Revista Colombiana de Matematicas 10, 95-97. 

In the proof of Lemma 3.3-1 given here, we have incorporated a simplifica- 
tion suggested by T. Ando (1993, private communication). 

There is a third entity, independently introduced by H. Wielandt, at 
about the same time, which is also essentially equivalent to the angle ¢(T) 
and the deviation dev (T). In his lecture notes, 

H. Wielandt (1967). Topics in the Analytic theory of Matrices, University 
of Wisconsin Lecture Notes, Madison, Wisconsin. 
the maximum angle 


(A) = sup (z, Az), Az #0 


between vectors x and Az is called the singular angle of a square matrix. 
Wielandt also used (omitting proof, as Krein did) a triangle inequality 
equivalent to (3.3-1). Wielandt’s motivation, unlike that of Gustafson or 
Krein, both of whom were working on different (see the next section) semi- 
group problems, seems to have been in seeking generalizations of the Weyl 
theory for spectra of o(A + B) in terms of o(A) and o(B). 

Although the operator angle ¢(T) was the first to appear in the refereed 
journals, it seems almost certain that the three ideas ¢(T'), dev (T), and 
+(T) were all conceived independently, in different contexts, in 1967. A full 
accounting of this may be found in 
K. Gustafson (1996). “Operator Angles (Gustafson), Matrix Singular An- 
gles (Wielandt), Operator Deviations (Krein),” Collected Works of Helmut 
Wielandt, IT (B. Huppert and H. Schneider, eds.), De Gruyters, Berlin. 


3.4 Semigroup Generators 


It is interesting that both the angle ¢(T) and the deviation dev (T) came 
from the incubator of semigroup theory. The circumstances were, however, 
completely different. Let us briefly describe the motivations in the semi- 
group theory that led to ¢(7T) and dev (T). Recall that it is desired, for 
general unbounded operators T (e.g., differential operators), to define the 
semigroup by 


t n 
(3.4-1) eT = s- lim (1 -7 r) . 


T— OO 
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Conversely, given a semigroup U(t), it is desired to characterize its in- 
finitesimal generator T by 


(3.4-2) Tx = s- lim (Ue) = Der 

t—0 t 
We refer to the references at the end of this section for further details of 
the semigroup theory, as we will use them in the following discussion. 

In particular, let us recall the characterization theorem for the contrac- 
tion semigroup generators G(1,0) due to Hille and Yosida as modified by 
Phillips and Lumer, namely, Theorem (HYPL): A € G(1, 0) iff A is dissipa- 
tive and AJ —A is onto for all A with Re A > 0. This theorem holds generally 
for a Banach space X equipped with a semi-inner product [z, y]a, and an 
operator A is called dissipative if Re|Az,x] < 0 for all x € D(A). This 
class G(1,0) of generators of contraction semigroups |le’7' || < 1 is the most 
important in applications, and other classes of higher-growth semigroups 
can often be reduced to it or treated by similar methods. The contraction 
property generalizes the unitarity of the semigroups e“4 in the case that 
T = iA, A a selfadjoint operator in a Hilbert space. 

Thus, in Hilbert space a necessary condition for A to be such a semi- 
group generator is that the numerical range W (A) be in the left half (closed) 
plane. Applications often require that one work with perturbations of sim- 
pler generators A, and one wants to know when, for example, A+B remains 
a generator when A is perturbed by some relatively subordinate operator B. 
A somewhat final result in that direction is given by Theorem (RKNG)): if 
A € G(1,0) and B is a relatively small perturbation, ||Bz|| < a|| Azx||+6||z|| 
on D(A), where a < 1, and A+ B €e G(1,0) iff A+ B is dissipative. The 
RKNG stands for Rellich-Kato—Nagy—Gustafson. A similar multiplicative 
perturbation result may be obtained directly from the RKNG theorem. 
The multiplicative perturbation question may be posed: given A € G(,0), 
when is BA € G(1,0)? If one notes that BA € G(1,0) iff «BA € G(1,0) 
for arbitrary € > 0, then one can write the multiplicative perturbation as 
an additive perturbation eBA = (eB — I)A+ A and ask when (€B — IJA 
satisfies the additive perturbation requirement of the RKNG theorem. Re- 
stricting attention to the case of B bounded and everywhere defined, we 
have the following theorem. 


Theorem 3.4-1 (Multiplicative perturbation). For X a Banach space 
with semi-inner product [x, y| and A € G(1,0) an infinitesimal generator of 
a contraction semigroup on X, if there is an e > 0 such that ||eB — I|| < 1, 
then BA € G(1,0) iff BA is dissipative. 


Proof. As described above, write «BA = (eB — IJA + A and employ the 
RKNG additive perturbation theorem. O 

Turning then to the remaining question of when BA is dissipative, recall 
that the condition ||eB — I|| < 1 for some e > 0 is equivalent to B being 


60 3. Operator Trigonometry 


strongly accretive: Re(Bz,z) Z mg > 0 for ||z|| = 1. This condition is, of 
course, entirely independent of A. But if we write 


Re (e«BAz, x) = Re ((eB — I) Az, x) + Re (Az, x) 

< |leB — I| Ax|||x|| + Re (Az, £), 
we see that BA will be dissipative if ||eB — I|| < cos(—A). This began 
the operator trigonometry. The second key development of the operator 
trigonometry came about by understanding, through the minmax theorem, 
(Theorem 3.2-1), that gm(B) = min ||eB — I||, € > 0, is sin B. Thus a 


sufficient condition for BA to be dissipative when B is accretive and A is 
dissipative is 


(3.4-4) sin B < cos(—A). 


(3.4-3) 


We will elaborate on this in the next section. 

Similarly, it is interesting to see how the quantity dev(T) arose out 
of a semigroup theory. The main need was to bound o(T) in a sector: 
| arg à| < dev (T). This came about in attempting to integrate an initial- 
value problem 


dz(t) 
(3.4-5) q ~ Ale), 
x(0) = T0, 


where A(t) is a time-dependent, infinitesimal generator found as the de- 
rivative of a function F(t) of strongly bounded variation. Then the in- 
tegral W(t) = IN exp{dF(t)} solves the initial-value problem (3.4-5) with 
solution z(t) = W(t)zo. Since W(t) is in fact a multiplicative integral 
due to the multiplicative property of the exponentials, written formally, 
W = II$ expdF, then by Theorem 3.3-2 above one has, again formally, 


t 
(3.4-6) dev (W) < X` dev expdF. 
0 


Since o(W) is in a sector bounded by dev (W), this guaranteesa, for exam- 
ple, that the negative real axis is not in o(W). Recall that this is a useful 
property when integrating systems of ordinary differential equations, and 
the abstract semigroup theory follows similarly. 

Let us note that when dF happens to be a selfadjoint operator H, then 


2Ve™eM 
(3.4-7) cos dev exp dF = em 4 eM” 
where m and M are the lower and upper bounds of H, respectively. Taking 
the largest and smallest M and m over the whole family of H in dF then 
provides a uniform bound for dev(W) in (3.4-6), and hence a uniform 
bound on o(W(¢)) for the solution of the initial-value problem (3.4-5). 
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Good references for the semigroup theory are 
T. Kato (1980). Perturbation Theory for Linear Operators, 2nd Ed., 
Springer, New York. 
E. Hille and R. Phillips (1957). Functional Analysis and Semigroups, Amer. 
Math. Soc. Colloq. Publ. 31, Providence, RI. 
The Lumer-—Phillips version of the Hille-Yosida theorem is given in 
G. Lumer and R. Phillips (1961). “Dissipative Operators in a Banach 
Space,” Pacific J. Math. 11, 679-698. 
For the theory of time dependent semigroup generators, see 
K. Yosida (1966). Functional Analysis, Springer, Berlin. 

For additive perturbation results, see 
K. Gustafson (1966). “A Perturbation Lemma,” Bull. Amer. Math. Soc. 
72, 334-338. 
K. Gustafson (1983). “The RKNG (Rellich-Kato—-Nagy—Gustafson) Per- 
turbation Theorem for Linear Operators in Hilbert and Banach Space,” 
Acta Sci. Math. 45, 201-211. 
For multiplicative perturbation results, see 
K. Gustafson (1968c). “A Note on Left Multiplication of Semigroup Gen- 
erators,” Pacific J. Math. 24, 463-465. 
K. Gustafson and Ken-iti Sato (1969). “Some Perturbation Theorems for 
Nonnegative Contraction Semigroups,” J. Math. Soc. Japan 21, 200-204. 
K. Gustafson and G. Lumer (1972). “Multiplicative Perturbation of Semi- 
group Generators,“ Pacific J. Math.. 41, 731-742. 
For recent results, see 
R. Kumar and P. Das (1991). “Perturbation of m-Accretive Operators in 
Banach Spaces,” Nonlinear Analysis 17, 161-168. 

For the origin of dev (T) in the semigroup context, see 
M. Krein (1969). “Angular Localization of the Spectrum of a Multiplicative 
Integral in a Hilbert Space,” Functional Anal. Appl. 3, 89-90. 


3.5 Accretive Products 


In the semigroup perturbation theorems of the preceding section, the two 
issues were that a perturbed generator T = A+B or T = BA remain 
maximal in the sense that AJ — T is surjective for Reà > 0, and that 
T remain dissipative: Re(Tz,xz) < 0. The maximality follows from the 
now well-established index (Fredholm) theory of linear operators. The 
dissipativeness of the sum A+ B can often be seen quickly, due to the 
subadditivity of the numerical range 


W(A+B) Cc W(A)+W(B) 
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But the same cannot be said of the product BA. Specifically, when Re (Az, x) 
< 0 and Re (Bz,z) 2 0, can one expect that Re(ABz,z) < 0? As the ex- 
ample given at the beginning of Chapter 2 shows, this is generally not the 
case. 

In this section, therefore, we consider the question of when the product 
(i.e., composition) of two accretive operators is itself accretive. It fol- 
lows from the preceding section that the operator trigonometry provides 
an interesting sufficient condition. Indeed, this question constituted the 
inception of the operator trigonometry. 


Lemma 3.5-1. Let B and A be bounded, accretive operators in a Hilbert 
space. Then BA is accretive if 


(3.5-1) sin B < cos A. 


Proof. The inequality (3.4-3) and our understandings of sin B and cos A. O 

To get more feeling about the sharpness of this operator trigonometry 
as applied to the accretive operator product question, let us note how it 
improves what one might obtain otherwise. For example, if one writes for 
strongly accretive B and A and any z 


Re (BAz,z) = Re ((B — I)Az, x) + Re (Ax, 2), 


then, taking ||x|| = 1 for convenience, bounding the two terms on the right- 
hand side in the natural way, 


[Re ((B — I)Az, x)| S |(B - DHAI| and 
Re (Ax, x) 2 m(A), 


leads to the sufficient condition for BA to be accretive by (a): || B — I|| < 
m(A)/||Al||. Given (a), we may improve the criteria for product accretivity 
by (b): put €B into (3.5-2) with € optimal so that ||eB — I|| = sin B is 
now inserted into (3.5-2). A further sharpening, given (b), may be had by 
replacing (a) with (c): divide out the ||Az|| in (3.5-2), e.g., as in (3.4-3), 
thereby effectively inserting cos A into the inequalities for product accre- 
tivity. Having considered this criteria-sharpening sequence (a), (b), (c), let 
us illustrate it with an example. 


(3.5-2) 


Example. Let A and B be positive selfadjoint operators with ||A|| = 
|| Bl] = 1 and m(A) = 1/2, m(B) > 0. From the sufficient condition 


|B — T|] £ m(A)/I\Al], 


one can assure that BA is accretive for m(B) down to m(B) = 1/2. From 
the sufficient condition 


sin B S m(A)/||All, 
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one can assure that BA is accretive for m(B) down to m(B) = 1/3. From 
the sufficient condition 


sin B < cos A, 


one can assure that BA is accretive for m(B) down to m(B) = 0.0295. 

In the example, we used the facts that ||B — I|| = 1 — m(B), g9m(B) = 
min, ||eB—J|| = sin B = (||B||—m(B))/(|B||+m(B)), and cos A = 0.9428, 
so that we need only 


1 — m(B) 
1+ m(B) < 0.9428, 

from which it follows that BA will be positive whenever m(B) 2 0.0295. 
This is more than a 10-fold improvement over the first two sufficient condi- 
tions and says that “most” positive, selfadjoint B when multiplied against 
the given A will retain the positivity of BA. 

The sufficient condition sin B < cos Á for a positive numerical range 
W (BA) may be generalized to a notion of supersets for the whole numerical 
range W(BA). For a bounded operator T on a complex Hilbert space, let 
us recall the abbreviated notations 


W(T) = {(Tz, z), |lal| = 1}, 


mT = l inf i |(Tzx, x)|, 


wr = sup |(Tz,z)|, 
z||=1 


Or = sup | arg z|, 
zEWw(T) 


cosT = inf Re(Tz,2)/la\|- T2]; 
Joos |T = inf |(Tz,2)|/ll - Tel, 
[sin |T = [1 — | cos |?T}*/2. 
Let D be the subset of the complex plane consisting of the semiannulus 


z = re? with mamp <r < waws, 0| < 04 + Op. 


Theorem 3.5-2. Let A and B be bounded operators on a Hilbert space. 
Then 


W(AB) c >| ){o}, where 


(3.5-3) S= VU {2| lz -Al < ||ABl|((sin|A)(|sin|B)}. 
AED 
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Proof. Let G denote the Grammian or Gram determinant G = G(21, £2, £3) 
of three vectors in a Hilbert space, namely 


(21,21) (21,22) (21,23) 
G = | (2,01) (f2,L2) (x2,23) |. 
(23,21) (23,22) (x3, 23) 


G is always real and positive, and strictly positive iff the three vectors 
are linearly independent. Let xı = ABz/||ABz||, ro = Bz/||Bzx||, and 
z3 = T, a unit vector; we assume ABr £0 here. Also, let (x1, x2) = ae*®, 
(x2, 23) = be*® , and (x1,23) = pe*’. Then by the positivity of G, we have 


1 + 2pabcos(—y + a + B) — a? — b? — p? >0 
and thus 
[p cosy — abcos(a + 3)]? + [psin y — absin(a + 8)}? < (1 — a”)(1 — b?). 


Recalling that (ABz, x) = ||ABz||p(cos y+isin y), and because a > | cos|A 
and b > |cos|B, we have 


[Re (ABz, x) — Re A]? + [Im (ABz, z) — Im A]? 
< || ABz||*(| sin |A)? (| sin |B)’, 


where \ denotes the point ||ABz||ab(cos(a + 8) + isin(a + B)). Since A is 
in D, we have shown that (ABz, z) is in E. 

Thus W(AB) c © U {0}. Moreover, if {0} is attained (e.g., in the case 
where AB is singular), then by the convexity of W(AB) and the above 
argument all rays from {0} to other points in. W (AB) must be in ©, which, 
being a closed set, must therefore contain {0}. Then we may conclude that 
W(AB) ch. O 

As we have mentioned previously, the general numerical range mapping 
theorems of Chapter 2 do not encompass the question of when T accretive 
implies T? accretive. Lemma 3.5-1 gives us the sufficient criteria that the 
real angle of T be less than or equal to 45°. But note carefully that this is 
not the same as specifying that W(T) be in the +45° sector of the complex 
plane, and the counterexample given earlier shows that condition to be 
insufficient. The criteria of Theorem 3.5-2, although general, are seen by 
some sectorial examples to allow too large a containment superset in that 
situation. Therefore, we turn to the theory of sectorial forms to consider 
this question. Because that theory embraces both bounded and unbounded 
operators T, we are able to treat both in the following. 

Let T be a densely defined sectorial operator in a complex Hilbert space 
with semiangle 0r < 7/2 and vertex y, = 0. We refer the reader to 
(Kato, 1980; see the Notes) for further details that we employ concerning 
sectorial operators, and throughout we assume for simplicity that the vertex 


y, = 0. Let tu, v| be the sesquilinear form (Tu, v) with D(t) = D(T) and 
with closure t[u,v]. W(T) is dense in W(t), the adjoint form t* is also 


(3.5-4) 
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sectorial, and the two closed symmetric forms 5 (t + t*) and 1/2i)(t — t*) 
are uniquely represented by selfadjoint operators Re T and Im T such that 
5 (t+ t*)[u,v] = ((ReT)u,v) for u € D(ReT), v € D(t) = D(t*) = 
D(t+t*) = D(t — t*), and + (t — t*)[u, v] = ((ImT)u, v) for u € D(ImT), 
v E€ D(t). 


Theorem 3.5-3. Let T be a sectorial operator in a Hilbert space such 
that D(T) C D(ReT) N D(ImT) and for which there exists b < 1 such 
that ||(ImT)ul| < b||(ReT)ul| for all u € D(T?). Then T? is accretive and 
r2 < 2tan7' b. 


Proof. For u € D(T) and arbitrary v € D(T), we have 
1 1 
Tv,u) = = (t+ t*)[v, u] + = (t — t*)[v,u 
355) (Tu,u) = 5 (E+ )lo, u] +5 E- o,a] 
= (v, (ReT — ilm T)u), 


so that D(T) C D(T*) and T*u = (ReT — ilm T)u on D(T). In particular, 
for u € D(T?) we have 


(T?u, v) = (Tu, T*u) = ((ReT + ilmT)u, (ReT — ilm T)u) 
= ||(ReT)ul|? — ||((ImT)u||? — 2i ((Re T)u, (Imu). 


Since b < 1, we have Re (T*u,r) > 0. 
Concerning the sector angle for the numerical range of T?, the case b = 1 
is just a restatement of the accretivity of T?, so any improvement comes 


only in the case b < 1. Also, we may assume (ReT)u Æ 0, for otherwise 
(T?u,u) = 0. Thus 


(3.5-6) 


(3.5-7) [Im (T*u, t)| 2\((ReT)u, (ImT)u) 2 z = tan(2tan™* b), 


Re (T?u,u) ~ (1-—b?)||(ReT)ul? 7 1—b 


which gives an improved bound on sector angle when b < 1. O 

For a bounded operator, the above arguments yield the following neces- 
sary and sufficient condition for the accretivity of T?. Note that in Theorem 
3.5-3 and Corollary 3.5-4 we do not need to assume that T is accretive. 


Corollary 3.5-4. Let T be a bounded operator on a Hilbert space. Then 
T? is accretive iff |\(ImT)z|| < b||(ReT)z|| for some b < 1. In that case 
r2 < 2tan™t b. 


Proof. Immediate from Theorem 3.5-3. O 

If in Theorem 3.5-3 T is m-sectorial, then T is its own Friedrich’s exten- 
sion. But it is not generally known when T can be written as T = A + iB, 
with D(A) = D(B) = D(T), A and B symmetric. Via a partial result of 
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this type we are able to obtain necessary and sufficient conditions for T? 
accretive for a certain class of m-sectorial operators T. 


Theorem 3.5-5. Let T be anm-sectorial operator with D(T) C D((ReT)). 
Then T C ReT +iC, where C is symmetric, and T? is accretive iff 
\|Cx|| < b||(ReT)z|| for some b < 1 and all x € D(T?). In that case, 
r2 < 2tan—} b. 


Proof. By the “angle-boundedness” factorization (Kato, 1980, p. 337), 
we may write T = (ReT)!/?(I + IB)(ReT)'/?, where B is selfadjoint and 
||B\| < tan ðr. By hypothesis, z € D(T) implies (Re T)!/2x € D((ReT)}/?). 
Hence, by the construction of B, it may be verified that we have x € D(C), 
where C = (ReT)!/2B(ReT)}/2. 

Since T C ReT + iC, we have T* > ReT — iC, and for x € D(T?) as in 
the proof of Theorem 3.5-3, we have 


(T?2x,x) = (Tx, (ReT — iC)z) 
= ||(ReT)z||? — ||Cx||? — 2iRe ((ReT)z, Cz), 


the result following as before. O 


Notes and References for Section 3.5 


The sufficient condition sin B < cos A for operator product accretivity was 
first stated in 

K. Gustafson (1968c). “A Note on Left Multiplication of Semigroup Gen- 
erators,” Pacific J. Math. 24, 463—465. 

The result of Theorem 3.3-2, see 
M. Krein (1969). “Angular Localization of the Spectrum of a Multiplicative 
Integral in a Hilbert Space,” Functional Anal. Applic. 3, 89-90, 
that dev(BA) < dev(B) + dev(A), implies that BA is accretive when 
dev (B)+dev (A) < 2/2, a hypothesis which implies the sufficient condition 
sin A < cos B or sin B < cos A of Lemma 3.5-1. 

As mentioned in Section 2.5, the commuting case is much easier. For 
example, for positive selfadjoint A and bounded accretive B, BA is always 
accretive, as observed by 
K. Gustafson (1968a). “The Angle of an Operator and Positive Operator 
Products,” Bull. Amer. Math. Soc. 74, 488-492. 

Later this was extended to W(AB) c W(A)W(B), without B necessarily 
being accretive but with A required bounded, by 

R. Bouldin (1970). “The Numerical Range of a Product,” J. Math. Anal. 
Appl. 32, 459-467. 

For A and B selfadjoint matrices, the positivity of A and that of Re AB 
necessitates that of B. Furthermore, it should be noted specifically that 
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even when A and B are positive definite, selfadjoint operators, AB need 
not be accretive in the noncommuting case. For example, consider 


Aa|ita l-a pa|..1+2a 21/2(q — 1) 
~ |l-a@ l+a ~ | 21/2(q@ — 1) 2+a 


where a > 0. Then, for a = 271(2!2 — 1)(5 +22) and z = (1,0) one has 
(ABz, z} < 271(1 — 21/2) <0. 


The superset Theorem 3.5-2 was obtained by 
D. Rao (1972). Numerical Range and Positivity of Operator Products, 
Dissertation, University of Colorado, Boulder, Colorado. 

For results on sectorial and sesquilinear forms, which we used in the 
proofs of Theorem 3.5-3 and Theorem 3.5-5, see 
T. Kato (1980). Perturbation Theory in Linear Operators, 2nd Edition, 
Springer, New York. 
A number of sufficient conditions beyond those presented here for positive 
operator products are given in 
K. Gustafson and D. Rao (1977). “Numerical Range and Accretivity of 
Operator Products,” J. Math. Anal. Appl. 60, 693-702. 
See also 
K. Gustafson (1994). “Operator Trigonometry,” Linear and Multilinear 
Algebra 37, 139-159. 
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The quantity cos A has another interpretation as the first antieigenvalue of 


A: 


Re (Az, x) 
361 A)= inf ————. 
(3.6-1) H(A) z€D(A) ||Az|| - ||z|| 
0, Ar0 


Also defined were the higher antieigenvalues 


. Re (Az, x) 
in aE 
zeD(A) || Aar|l|farl| 
rl{ } 


Ti PE ytn~1 


(3.6-2) Mn(A) = 


where the xx were called the corresponding antieigenvectors of A. It was 
proposed (see the Notes) to consider this as a spectral theory analogous to 
the usual spectral theory of eigenvalues and eigenvectors. Total antieigen- 
values, e.g., the first one would be 


(Az, x)| 


(3.6-3) aA) = ink Tarj izi 


Azz 
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were also defined. When one thinks of eigenvalues and their corresponding 
eigenvectors, those are the vectors for which A dilates but does not turn at 
all. The name “antieigenvalues” was chosen to connote the opposite: the 
critical turnings of A. 

By Theorem 3.1-2, we know for T strongly positive, selfadjoint that 


2 V Amin Amax 


Amin + Amax 


where Amin = m(T) = the lower bound of the numerical range W(T), 
and Amax = M(T) = the upper bound of W(T). Thus, it is evident that 
generally one might expect relations between the antieigenvalues of T and 
the eigenvalues of T, the strength of that relationship perhaps decreas- 
ing as one departs from normal-like operators. Exact expressions for all 
of the antieigenvalues and antieigenvectors for strongly accretive, finite- 
dimensional normal operators have been obtained. 


(T) = 


? 


Theorem 3.6-1. Let T be a normal accretive operator on a finite dimen- 
sional Hilbert space H with eigenvalues 


Ai = Bi + iôi, a=1,...,n. 


Let 
E = {6;/|Ail: 1<i<n} 
and 
p= (VG RGR NP 
JAj |? — |Aal? 
(3.6-4) o < Bilas? = 26ilAsF? + 26A? = 1 


([As|? — |A; |?) (Gi — 85) 
L<i<n, 1<j<n, iy. 


Then p(T) is exactly equal to the smallest number in EU F. Furthermore, 
if T is diagonal and 


(8; — Bi)(BilAgl? — B5lAal?) 
JAg|* — JA]? 
then p(T) = (Tz, z)/||Tz||, for some z with 
By|Ag? — 26:là; |? + blà]? 
(Ail? — |A51?)(8: — By) 
Bilal? — 26, [Al? + BilAI? 
(As|? — |Ag|?)(6i — By)” 
and zk = 0 fork # i, k £ J. 


p(T) = 2 


zi? = 
(3.6-5) 


|z; = 
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Proof. See (Gustafson and Seddighin (1989), see Notes). The proof is 
somewhat involved and uses the Lagrange multiplier method. Note that it 
presumes that one knows all of the eigenvalues of T. OU 

Theorem 3.6-1 has been generalized to find all total antieigenvalues, in- 
cluding the higher ones, and their corresponding antieigenvectors. We state 
and prove this result first for the first total antieigenvalue. The proof for 
the higher antieigenvalues follows by the same methods. 


Theorem 3.6-2. Let T be a normal operator on a finite-dimensional 
Hilbert space H with eigenvalues A; = b; +i ô, i = 1,...,n. Then, the first 
total antieigenvalue is either 1 or the smallest number in the set of values 


(3.6-6) 
g- | VODI FAND E C-IP iaa n} 
(Ail + AD VIAA | 


Moreover, if T is diagonal and |y,(T)| = 1, then the first total antieigen- 
vector is 2“) = (z,... , Zn), with |z;| = 1 for some j and all other z; = 0. 
If |,(T)| is one of the values in G, then the components of z satisfy 
Jal? = AA + A)T Leal? = ACA + [Ag|)~*, and all other 2% = 0. 


Proof. We assume that T has already been diagonalized, i.e., that 
{e1,...,€n} is a basis for H with respect to which the normal operator T is 
diagonal. The statements of Theorem 3.6-2 are with respect to that basis. 
The theorem thus expresses the fact that the antieigenvectors of T are 
generally expressible in terms of just two of the eigenvectors of T. This is a 
considerable conceptual clarification as to the nature of antieigenvectors, at 
least for normal T, and as will be seen it also leads to easier computations 
of the higher antieigenvalues ||. Of course, one would have to diagonalize 
T, that is, find all its eigenvectors first. 

Let z = (21,..-,2n), zi = 21 + iyi, be any vector in H, expressed in the 
eigenbasis {e),...,e,} for T. Then the problem at hand is to minimize the 
positive function 


(x, 2)/? (Sala?) + (Sala) 


(3.6-7) f(z) = 7a = ~ 
T222] SD JA;l2lz;]2 
i=l 
on the set $; |z;|2 = 1. First, suppose f attains its minimum at a 


point z on this sphere with only one |z;| = 1, all others vanishing. Then 
lu |(T) = 1. Next, suppose f attains its minimum at a point z on this unit 
sphere with exactly two nonvanishing components, z; and z;. Then, 


f(z) = GB? |z: +83 |z5|4 +28: 8; |2i |? |z5 |? +67 | 244 +57 [25 [4 +264 85 | 2s |? |z5 |? 
[Ail lzi +A] 1251? ` 


70 3. Operator Trigonometry 


But in |z,;|? + |z;|* = 1 we may let x = |z;|?, and then 


F(z) = LPD (6-55) 1a 4 2185 (Bs 5) +85 (65-55 Ne +(85 +85) 
(JAa]? |g |?) 2+]; 


The minimum of f(z) is obtained by setting f'(x) = 0. Omitting the 
details, we ascertain that the numerator of f'(x) is 


(8? + 67 — BF — GiBi — Bj)? + (6: — 6;)"] a? 
+ 2(65 + 65)[(6? — Bj)? + (6; — 6;)"]2 
+ 2(65 + 57)[8;(8: — Bj) + ô; (ê: — 5)] 
— (Bi + 67 — BF — 6)(65 + 64), 
which further reduces to 
(3.6-8) N(x) = (|dsl? — P)? + 21g)? — lAl? 


The zeros of the latter are r+ = |A;|/(|Aj|£|A4|). If the root |A;|/(|Aj|—|Aal) 
is negative, it is of course not acceptable as |z;|*, and if it is positive, then 
1 — x is negative and not acceptable as |z;|*. If it is zero, we are in the 
previous case of a single nonzero component. 

Hence 


As Aa 
JAg] + [As] [Az] + [Aa] ’ 


as claimed in the theorem. For this first total antieigenvector z, f(z) may 
be written as 


(3.6-9) |z,|? = and |z,|* = 


f(z) = (Piz Bade + Bi)? + (Gi — 6j)@ + 8)? 
(De? = Pe + Da? : 


Substituting the root x = |A,;|(|A;| + |A;:|)7? into this expression yields 
(after some simplification) © 


(B:|A3| + lAl)? + (6:|A3] + 6 — il)" 
(Ail + Ag |)? |Aa| A, 


Hence, the first total eigenvector |uı|(T) is the square root of the right-hand 
side of (3.6-10). 

Now suppose that exactly three components, say, Zi, Zj, Zk, are nonzero 
at a minimizing point z° on the unit sphere. We may proceed to rule out 
this possibility. For such a z°, f(z) is of the form 


(3.6-10) — |yi|°(T) = f(z) = 


f(z) = (B:|A2|? +Bs)AGT + Be l2n) )?+(5: 122 1? +55 125 |? +60 121° 2 
[Az REFN PRF Da 210 


For general three component, z = (0,...,2i,0,...,0,2;,0,...,0,...), let 
[zil? = v, |z;|? = ve, |z|? = v3, and consider the simplex, 


(3.6-11) V:v +v +v3=1, 0v <1, 0<v<1, 05481. 
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Let R be the minimum of the positive function g(v1,v2,03) = y f(z) on 
the simplex V. We may suppose that this minimum is attained at a strictly 
interior point (v?, v8, v2) in V. But then it may be seen that we are as- 
sured of a minimizing point on the boundary of V as well. Moreover, by 
the convexity of the function g(v1,v2,v3), the minimum attained at the 
boundary point is strictly smaller than that at the interior, contradicting 
the presence of three nonzero components in the first place. Similarly, four 
and higher numbers of components may be ruled out. 

Since R is the smallest positive number such that the graph of g(v1, v2, v3) 
—R touches one point of the simplex, then the intersection of the graph 
with one coordinate plane, say, the (vi, vj) plane, must cut the side of the 
simplex in that plane. This implies that 


9): .ay.\2 .9)- .9).)2 
(3.6-12) inf v (Bivi + Byuj)* + (ivi + 6505)" = R < R, 


Aio + o; 


and consequently we have 


J (Bivi + Bjvj + Breve)? (divi + 650; + 640%)? SR 


1 
[Ail vi + [Ag |?05 + [Az vk 


for any (vi, vj, Vk) With v; +u; tu, = 1,0 < v; <1,0 <v <1,0 <v <1. 
This convexity argument thus demonstrates that no more than two nonzero 
components need to be considered for the antieigenvector. U 


Theorem 3.6-3. For a normal operator as in Theorem 3.6-2, all higher 
total antieigenvalues take their values from the set G U {1}, and all cor- 
responding higher total antieigenvectors possess the same two-component 
structure given in Theorem 3.6-2. 


Proof. The arguments all dealt with minimizing the positive function 
f(z) on the unit sphere $`; |z:|? = 1. The only property of the z; used 
was the expression of | (Tz, z)|/||Tz|| - |z|] in terms of them. The convexity 
argument ruling out three or more components in a minimizing vector 
took place completely in terms of those three z; and did not use any other 
normality properties of T. O 

Another approach (Mirman, 1983, see Notes) proposed some interesting 
convexity methods for the calculation of the antieigenvalues of accretive 
operators. The numerical range W (S) of an auxiliary operator S plays a 
fundamental role in these methods. 


Theorem 3.6-4. Let T be strictly accretive, S = ReT + iT*T, € + 


im, 2 +in2 € W(S), & < &2; define 


E= Eine — €2m 
n2 — m 
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Then pı(T) < ñ, where _ 
(a) if m < 72, 1/2 < E < &2/2, then 


R? — 462 = Ši. 
n2 — m 
(b) ñ2 = €2/m if m > m or ifm < n and £ < &/2; 
(c) A? = &3/n2 ifm < m and € > &2/2. 


Proof. One may verify that 
(3.6-13) u? (T) = inf {é?/n : € + in € W(S)}. 


Hence, the problem reduces to finding the minimum of J(£,7) on the line 
segment joining €; +27, and £2 +in2. If this minimum occurs at either end 
point of the line segment, then its value is either £? /m or £2/n2. 

If it occurs at an interior point, then at that point the line segment and 
J[E, n] should have parallel gradients. The equation of the line segment is 


-m=O (¢g_ 
n=-m= gE (E — &1) 


or 
mı — n2 N2 — M 
— M + + = 0. 
Tom + re Stee, 
Therefore, we should have for some & 
26 _ cn — n2 
1) E2 — &1 
and 
2 
-E = Ç, 
n 
which implies 
¿= S12 — om and n= Eine — €2m 
n2 — M E2 — £1 


and therefore 


E am — €2m )(&2 — &1) 


n (n2 - m)? 
Taking 
¿= Eine — €2m 
72 — m 
we have 
3 — E£ TEL, 


n n2 — m 
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Hence, the minimum of 7/7 on the line segment is one of the values 


£2 /m,, €2/no, and 4€((€2 — €1)/(n2 — m)). One can actully find out which 
of the three values is the smallest using conditions (a), (b), or (c) in the 
theorem. O 

Let us give one further result of this type. The proof combines convexity 
methods and the Lagrange multiplier approach. 


Theorem 3.6-5. Let T be strictly accretive. If ||T|| < oo, then 
(3.6-14) u2(T) =4 max(Azt) > 0, 


where à is the lower bound of the spectrum of the operator 


(3.6-15) S, = ReT —tT*T. 


Proof. Let S = ReT +i7*T, ReT, and ImS = T*T. From (3.6-13) we 
have 


pi (T) = inf{é?/n : € + in € W(S)}. 


Let & be a straight line of support of W(S), with W(S) to the right of 
fi; then Az is the lower bound of o(ReT — tT*T). Therefore, we should 
consider the minimum of the function J(€,7) = €7/n on such lines. The 
equation of such a line is n = (1/t)(€ — Az) or n — (1/t)E + (1/t) At = 0, and 
hence we must have 


n 


from which we have € = 2; and ņn = A¢/t, and so £7/n = 4A? /(àt/t) = 
4trX;. O 

At this point, it is useful to consider some simple examples of antieigen- 
values and antieigenvectors. from which we have € = 2; and n = A;/t and 
SO 


Example. Consider the 2 x 2 matrix 
a= io 6 A, =m=9, 
0 16 do = M = 16. 
We know that all antieigenvectors are of the form 
tV VA 
Vità Vati’ 


where the indices 7 and j run through all eigenvalues. There are only a few 
combinations for this example, from which immediately one finds 


zı = Ist antieigenvector = (—4/5,3/5). 
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The angle that it turns A is seen from 


(Azı,zı) 2(12?) 24 
= = —— A = A = . , 
|| Az; | 52.12 25 “0 pı (A) = 0.96 
The second antieigenvector, if required to be orthogonal to z1, is then seen 
to be 


Z2 = 2nd antieigenvector = (3/4, 4/5), 
which turns A by an angle whose cosine is 
(Aza, 22) _ (3)(27) + (4)(64) _ 337 


|| Aze| (27)2 + (64)? 347.3110997 


Thus, the angle ¢(A) is 16.260° and the second critical angle ¢2(A) would 
be 13.997°. These angles are easily visualized by plotting the first antieigen- 
vector z; = (—4,3) and its image Az, = (—3, 4); similarly, z2 = (3,4) and 
its image Azz = (27,64), in 2-space. 

Notice, however, that 


= pi2(A) = 0.97031. 


zı = (4/5, 3/5) 


is also a first antieigenvector. Generally, antieigenvectors come in pairs, are 
not orthogonal, and linear combinations of them are not antieigenvectors. 
This has led us to an alternate formulation of the higher antieigenvalues and 
corresponding antieigenvectors, namely, a combinatorial theory of higher 
antieigenvalues corresponding to the critical angles obtained from higher 
antieigenvectors constructed from systematically deleted sets of eigenvec- 
tors of the matrix. This will be illustrated by a numerical example in the 
next chapter. 


Another Example. Consider the 3 x 3 matrix 


10 0 à = 1, 
A= 0 1 0 ) Ag = 1, 
002 le =2. 


The formula for cos A for selfadjoint A immediately gives 


(A) = cos(A) = 2v2 = 0.94281. 


It is also easily seen as above that both 
21 = (0, V2, 1), 
22> (0, —/2, 1) 


are first antieigenvectors. Note that they are not orthogonal. If we still per- 
sist in asking that higher antieigenvectors be orthogonal to prior antieigen- 
vectors, we are left in this example with the third antieigenvector 


23> (1, 0,0), 
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which does not turn A at all and is in fact an eigenvector. 

The full theory of higher antieigenvalues and their corresponding higher 
antieigenvectors, especially for arbitrary matrices and operators, is yet to 
be worked out. Perhaps several different versions will be needed, as dictated 
by the operator classes and applications considered. 

Let us turn back to the first antieigenvalue pı and derive a fundamental 
result for it and its corresponding antieigenvector pair. For simplicity, 
we consider only the case A bounded, strongly accretive. Consider the 
antieigenvalue functional 

(Au, u) 


y(u) = Re ——-—.. 
(u) = Re Taulu] 


Theorem 3.6-6 (Antieigenvector equation). The Euler equation for the 
antieigenvalue functional (u) is 


(3.6-16) | 
2|| Aull? llul? (Re Aju — |jul| Re (Au, u)A* Au — || Aul|?Re (Au, uju = 0. 
Scalar multiples of solutions are solutions, but the solution space is gen- 


erally not a subspace. When A is normal, the Euler equation is satisfied 


not only by the first antieigenvectors but also by all of the eigenvectors of 
A. 


Proof. To find the Euler equation, we consider the quantity 


Re (A(utew),ut+tew) _ Re (Au,u) 
du (A(utew),A(utew))?/? (utew,utew)! 2 (Au, Au)! 2(u,u)}/? 
dw |e=0 E e—0 € l 


Let the expression on the right hand side be denoted R4 (u, w,€). We have 


eRa(u, w, €) = [Re (Au, u) + 2eRe ((Re A) u, w) 
+ €e (Aw, w)](Au, Au)!/?(u, u)!/2/D 
— |(Au, Au) + 2eRe (Au, Aw) 
+ €?(Aw, Aw)]!/2[(u, u} + 2eRe (u, w) 
+ €?(w, w)]/*Re (Au, u) /D, 


where D is the common denominator 
D = (A(u + ew), A(u + ew))/? (u + ew, u + ew)? (Au, Au)? lu, u). 


At this point, in deriving the Euler equations for eigenvalues of a selfad- 
joint operator, one gets a fortuitous cancellation of the e-independent terms 
in the expression analogous to eR4(u, w, €), and the Euler equation imme- 
diately follows. Although that fortuitous situation does not occur here, 
we may attempt to mimic it by expanding the two square root bracket 
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expressions of the second numerator term 
1 
[(Au, Au) + 2(e)]/? = (Au, Au)! + 5 (Au, Au)! x(e) 


— = (Au, Au) ?/2r? (e) +- 
and 


[(u,u) +y? = (uu)? + 5 (u, u) yle) 


1 _ 
— 3 su) ye) ter, 


where z(e) and y(e) are the e-dependent terms, respectively, and where e 
is sufficiently small relative to (Au, Au) and (u,u), respectively. Then we 
obtain (Au, Au)!/? (u, u)!/?Re (Au, u) term cancellations, from which 
[2eRe((ReA)u, w) +e2(Aw, w)](Au, Au)?/?(u, u)}/2 
D 
Re(Au, u)[(u, u)! ?r(e)+ (Au, Au)!/2s(e)] 
D . 

where r(e) and s(€) denote the remainder terms in the square root series 

expansions above. To be specific, 
1 


r(e) = 5 (Au, Au)" ?e(e) - > (Au, Au) 3r) +, 


s(€) = 5 (uu) MP y(e) — & a, 


eRa(u, w,€) = 


where 
z(e) = 2eRe (Au, Aw) + (Aw, Aw), 
y(e) = 2eRe (u, w) + e (w, w). 
We may now divide by e, from which 
D- Ra(u,w,€) = [2Re ((Re A)u, w) + (Aw, w)]|| Aull ||u| 
— Re (Au, u){|| Au||—?||u||Re (Au, Aw) + O(e)] 
— Re (Au, u)|[||ul|~*||Au|[Re (u, w} + O(€)]. 
Note also that by the above expansions, 
D = [\|Aul] + O(6)][llul] + OC) Aullllul] > Aull? llull? 


as € — 0. 

Thus, in the € — 0 limit of Ra(u, w,e€), we arrive at 
du = 
dw e=0 


2Re ((Re A)u,w) || Aull? lull? -Re 


(Au,u) [||ul|?Re (Au, Aw) +] Aul|?Re (u,w)] 
[| Aw]? llull 
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Setting this expression to zero yields 
2|| Au]? ||u||?7Re((ReA)u, w) — Re(Au, u)|||ul|?Re(A* Au, w) 
+ || Au||?Re(u, w)] = 0, 
for arbitrary w, and hence the Euler equation 
2|| Aul|?||ul|?(ReA)u — ||ul|?Re( Au) A* Au — || Aul|?Re(Au, uju=0. O 


As the Euler equation is homogeneous (of order 5), scalar multiples of 
solutions are solutions. This was immediate from the variational quotient 
(3.6-1) for u, but as it turns out the implications are greater from the Euler 
equation (3.6-16): for selfadjoint or normal operators, all eigenvectors also 
satisfy it. The details of the verification are left to the reader. 

The following example is useful to see what is going on here. 

Example. Let 


A=(j i and x= (21,22) = (£1, Àz1), 


with A real. Then |z| = (1 + A)?/2|a,|, (Az,z} = (2 + A?)|z,|?, and 


|| Ax|| = (4 + A?)+/2|2,|, so the antieigenvalue functional p(x) is 
A 2 
ra= Ret4z.2) (244) O 
Azile (A4 + 5A? + 4)1/2 


Let us also consider the usual Rayleigh quotient 


(Az,rz) 2+? 


QQ) = (x,c) 142° 


Then, for large \ — œo, we have F(A) — 1, Q(A) > 1, 1 being the smaller 
eigenvalue of A, and x — (0,1), the corresponding eigenvector. For small 
A — 0, we have F(A) — 1, Q(A) > 2, i.e., to the larger eigenvalue of A, and 
xz — (1,0), the corresponding eigenvector. As we know from our general 
theory, F(A) attains its minimum pu; = 2/2 = 0.9428090416 at the first 
antieigenvectors x = (+1, V2), i.e., at A = +V2. Checking this against 
F(X) above, we have F(+V2) = 4/(4 + 10 + 4)!/2 = 0.9428090416. 
Calculating the derivative, we have 


(3.6-17) 
F'(A) = (tt 8X8-44)*/2(29)— (24A)? (27 (At 45A? 44) “1/2 (4084102) 
a A4+5A2 +4 
L (A445? 44)(4A)—(2+A)? (4A3+10A) 
2()X4+5A2+4)3/2 
— __ A(?-2) 
T (A445\244)372 ° 


Thus F’(A) = 0 at A = 0 and à = +o, corresponding to eigenvectors, 
and at A = +v2, corresponding to antieigenvectors. 
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Notes and References for Section 3.6 


The terminology “antieigenvalues” and “antieigenvectors” was introduced 
by 

K. Gustafson (1972). “Antieigenvalue Inequalities in Operator Theory,” in 
Inequalities III, Proceedings Los Angeles Symposium, 1969, ed. O. Shisha, 
Academic Press, 115-119. 

Upper bounds for y1(T) for T a finite-dimensional, strongly accretive nor- 
mal operator were obtained in 

C. Davis (1980), “Extending the Kantorovich Inequalities to Normal Ma- 
trices,” Linear Algebra Appl. 31, 173-177. 

The antieigenvalues and antieigenvectors for finite-dimensional normal op- 
erators, Theorems 3.6-1 to 3.6-3, were obtained by 

K. Gustafson and M. Seddighin (1989), “Antieigenvalue Bounds,” J. Math. 
Anal. Appl. 143, 327-340. 

K. Gustafson and M. Seddighin (1993), “A Note on Total Antieigenvec- 
tors,” J. Math. Anal. Appl. 178, 603-611. 

The convexity methods using the numerical range W(S), where S = 
ReT'+i7"T, giving Theorem 3.6-4, were developed by 
B. Mirman (1983), “Antieigenvalues: Method of Estimation and Calcula- 
tion,” Linear Algebra Appl. 49, 247-255. 

The Euler equation of Theorem 3.6-6 was known and casually mentioned 

at the 1969 Inequality III Symposium (Gustafson, 1972), but the proof was 
not published in full until 
K. Gustafson (1995), “Antieigenvalues,” Linear Algebra Appl. 208/209, 
437-454. 
Because both antieigenvectors and eigenvectors, in the selfadjoint and nor- 
mal operator cases, satisfy this Euler equation, this theory constitutes a 
significant extension of the Rayleigh—Ritz variational theory of eigenvec- 
tors. 

In the paper just mentioned, the new, combinatorial selection theory of 
eigenvectors, especially higher antieigenvectors, was formulated. This is 
advanced further by a computational example in 
K. Gustafson (1995), “Matrix Trigonometry,” Linear Algebra Appl. 217, 
117-140. 

That example shows a combinatorial selection of higher antieigenvalues 
(i.e., smaller critical angles) developing during an iterative calculation. 

Also in the latter paper, the minmax equality theorem (3.2-1) is revis- 
ited, with the proof now indicating a two-component nature for antieigen- 
vectors for general (strongly accretive) operators. How much of that two- 
component structure can be represented in terms of eigenvectors remains 
to be further explored. 

Some further exposition of the antieigenvalue theory may be found in 
K. Gustafson (1996). Lectures on Computational Fluid Dynamics, Mathe- 
matical Physics, and Linear Algebra, Kaigai Publications, Tokyo. 
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Other good names for antieigenvalues and antieigenvectors would be an- 
glevalues and anglevectors, turning angles and turning vectors, and so on. 
Perhaps, in deference to the German eigenvalue and eigenvector, the even- 
tual names should be winkelvalue and winkelvector. Winkel means “angle” 
or “corner” in German, angle becomes vinkel in Swedish and Danish, but 
in Dutch winkel becomes shop, presumably descending from “the shop on 
the corner”! Indeed, more descriptive names for eigenvalue and eigenvector 
would be dilationvalue and dilationvector, or their equivalent in another 
language such as German or French (e.g., in the former, streckenvalue or 
dehnenvalue)? Surely, we should accept evolutionary precedents and the 
terms “eigenvalue” and “eigenvector”, and that is why the same precedent 
evolved the terms “antieigenvalue” and “antieigenvector”. Moreover, as we 
have seen in this chapter, the latter in some ways are composed of the for- 
mer, again reinforcing the importance of accepting the priority of the term 
eigen ( self, inherent) to the values and vectors to which it has become 
historically attached. 


4 


Numerical Analysis 


Introduction 


The fact that numerical range and numerical analysis carry the same ad- 
jective is a historical accident. Indeed, numerical range derives from the 
German Wertvorrat, meaning value field, and as we saw in Chapter 1, it 
originated as the continuous range of bilinear forms. Numerical analysis, on 
the other hand, usually connotes a conversion from continuous to discrete 
and then to a computation yielding precise numbers. 

Nonetheless, there are interesting connections between numerical range 
and numerical analysis—most importantly, application of the former to 
the latter. The spectrum of an operator underlies many parts of numerical 
analysis, and that spectrum is contained within the numerical range of that 
operator. To the extent of this common tie of both the numerical range 
and numerical analysis to the spectrum, we can expect fruitful interplay. 

It turns out, for example, that the convergence rate of the standard nu- 
merical analysis steepest descent algorithm is exactly sin A. This rather 
beautiful new result connecting numerical analysis and numerical range 
carries over to the conjugate gradient algorithm, where the convergence 
rate is seen to be sin(A!/*). After presenting these recent results, we go 
back to the general numerical analysis of initial-value problems and exam- 
ine the important Von Neumann—Kreiss stability theorem in terms of the 
numerical range. Then we give a very short treatment of the role of nu- 
merical range considerations in computational fluid dynamics, both from 
the finite-difference and finite-element points of view. From there we turn 
to the workhorse Lax—Wendroff scheme for hyperbolic conservation laws, 
seeing that its stability condition is that of a numerical range contained in 
the unit disk. Then the recent extension of that theory to spectral methods 
and the corresponding notion of pseudo-eigenvalues is presented. 
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4.1 Optimization Algorithms 


The method of steepest descent is one of the basic numerical methods in 
optimization theory. In steepest descent for the minimization of a function 
f, a basic algorithm is 


(4.1-1) Lk4+1 = Lk anVf (rx)? . 
If we restrict attention to the quadratic case, where 


f(x) = 22 _ (o, 6), 


where A is a Symmetric, positive definite matrix with eigenvalues 0 < m = 
Ai S A2 S++ S An = M, then the point of minimum z”* solves the linear 
system 


Az* =b. 


For the quadratic minimization (i.e., the linear solver problem Az = b), 
the descent algorithm becomes 


lyrl| y 
4.1-2 T = Tkr — —, 
) e+ i (AYk, Yk} 
where yk = Az, — b is called the residual error. Letting 
x — x*), A(x — x* x*, Ax* 
pafe) = EELA) L pea) p AP 


measure the error in the iterates, one has the fundamental Kantorovich 
error bound 


Ea(tes1) S (1 — aia) Ea(zr). 


But in terms of A; = m and A, = M and the operator trigonometry of 
Chapter 3, this becomes 


Ea(re+1) S (1 — pi (A)) Ea(ze) 
= (1 — cos? A)JEA(Tk) 
= (sin? A)E a(x). 


Thus the error rate of the method, in the A!/? norm E,4(z)'/?, is exactly 
sin A. 


Theorem 4.1-1 (Trigonometric convergence). In quadratic steepest de- 
scent, for any initial point xo, there holds at every step k 


(4.1-3) Fa(@x41) < (sin? A)EA (zk). 


Proof. This follows from the discussion above and Theorem 3.1-2. O 
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We may interpret this result geometrically as follows. The first anticigen- 
value uı(A) = cos A measures the maximum turning capability of A. Thus 
the angle ¢(A) is a fundamental constraint on iterative methods involving 
A. Steepest descent does a good job but cannot converge faster than the 
maximum distance from z to Az, which is represented trigonometrically 
after normalization by the quantity sin A. 

More generally, in optimization theory for nonquadratic problems, one 
uses the Hessian of the objective function to guide descent by quadratic 
approximation. The smallest and largest eigenvalues m and M of the Hes- 
sians at each iteration point then determine the convergence rate of the 
method. Thus under rather general conditions, (e.g., objective function 
f(z,...,2n) with continuous second partial derivatives and a local min- 
imum to which you are converging), the objective values f(z) converge 
to the minimum linearly with convergence rate bounded by sin H of the 
Hessian. 

In (Luenberger (1984); see the Notes) it is shown how one may use such 
convergence theory to compare methods. Let us observe that the use of 
sin A can sharpen such comparisons. 


Example. One approach to minimizing f is to solve the equations of the 
necessary local minimization condition V f(x) = 0. It has been proposed 
to apply steepest descent to the function h(x) = |Vf(x)|*. Considering for 
simplicity only the case of quadratic f(x) = $(x, Ax) — (b, x}, for which 
Vf = Az — b, it may be seen that this means that steepest descent is to 
be applied to the function 


h(x) = (x, A*z) — 2(x, Ax) + le 


But we know by Theorem 4.1-1 that steepest descent applied to this qua- 
dratic function h will have convergence rate governed by sin(A”). This will 
always be a slower convergence than that which would accrue simply by 
using steepest descent in the first place. 

The convergence ratio of the two methods may be compared for large 
condition number K, which in this case is x = M/m. By the approximations 


(4.1-4) (S=) ~ (1 — 6-2)4, 


it follows that it takes about « steps of the proposed method to match one 
step of ordinary steepest descent. In terms of our operator trigonometric 
theory, the comparison is more explicit. The ratio of convergence rates in 
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the A? inner product to that in the A inner product is 
sin(A*) (M+m)? 2mM 2 
= OH l + HS =1 A“). 
sin A M2 + m? t 2 + m? + cos(A") 
Example. Let A be any symmetric square matrix with m = 1/4 and 
M = 1. Then A? has lower bound m? = 1/16 and upper bound M? = 1. 
For these two matrices A and A’, we thus have 
cos A = 4/5 = 0.8, 
sin A = 3/5 = 0.6, 
cos A? = 8/17 = 0.47059, 
sin A* = 15/17 = 0.88235. 


Thus the angles of these operators are 
(A) S 36.8699°, 
o(A*) = 61.9275°. 
The convergence rates are thus 


Ea, (2441) < 0.36 E(x) 


(4.1-5) 


and 
E42 (£k+1) < 0.77554 Ea(2r), 


respectively. 


Notes and References for Section 4.1 


The sin A steepest descent trigonometric convergence rate result Theorem 
4.1-1 was first presented in 

K. Gustafson (1991). “Antieigenvalues in Analysis,” Proceedings of the 
Fourth International Workshop in Analysis and its Applications, June 1- 
10, 1990, Dubrovnik, Yugoslavia, eds. C. Stanojevic and O. Hadzic, Novi 
Sad, Yugoslavia, 57-69. 

Further details are given in 

K. Gustafson (1994). “Operator Trigonometry,” Linear and Multilinear 
Algebra 37, 139-159. 

For optimization methods and steepest descent methods, see 

D. Luenberger (1984). Linear and Nonlinear Programming, 2nd Edition, 
Addison-Wesley. 

The angle of an operator ọ(A) of Section 3.1 is not the same notion 
as the angle between subspaces, which one finds in the invariant subspace 
theory, e.g., see 
G. Stewart and J. G. Sun (1990). Matriz Perturbation Theory, Academic 
Press, Boston, 
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for an account of this theory, which goes back to Krein, Krasnoselski, Mil- 
man, Paige, Kato, Davis, Kahan, Wielandt, and others. The operator angle 
(A) is really a two-dimensional concept, focusing attention on the most 
noninvartant one-dimensional subspaces! 

In the same vein, the fundamental new geometrical understanding of 
the Kantorovich bound given in the trigonometric convergence theorem 
(Theorem 4.1-1) is not the same as the Kantorovich—Wielandt inequality: 
an excellent summary of the Kantorovich—Wielandt inequality is given in 
[HJ1], which we discovered only during the writing of this book. The 
angle @ in the Kantorovich—-Wielandt inequality is motivated entirely by 
the condition number k = A,/A, and is defined by cot(0/2) = An/A1. 
Then, the Kantorovich—Wielandt inequality states that 


(Az, Ay) S cos 6|| Az||| Ayl 


for all pairs of mutually orthogonal vectors x and y. The geometrical 
interpretation of this inequality is that the smallest angle between Az and 
Ay is at most 6 and hence @ means the minimum attainable angle between 
Ag and Ay as x and y range over all orthonormal pairs of vectors. 
However, during this writing we have found that a wonderful, heretofore 
unnoticed connection between this angle 9 and our ¢(A) indeed exists: 
cos ¢(A*) = sin6! This connection between two quite distinct geometrical 
perspectives is elaborated in 
K. Gustafson (1996c). “The Geometrical Meaning of the Kantorovich- 
Wielandt Inequalities,” to appear. 


4.2 Conjugate Gradient 


Steepest descent algorithms are sometimes slow to converge. By contrast, 
conjugate gradient methods have the advantage of converging, if one ignores 
roundoff, in N iterations, for an N x N symmetric positive definite matrix 
A. Similar to the theory for steepest descent, one knows that in the A 
inner product error measure 


(æ -= 2*), Alz —2*)) 


Ea(z) = 9 


the conjugate gradient error rate is governed by (see Luenberger, 1984; 
Section 4.1 References) 


K(A) —1 


2k 
EFa(zr) $4 (2) Ea(2o) 
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for any initial guess zo. Remembering that the condition number k(A) = 
M/m, we may rewrite this as 


Mi/2 _ M}1/2 
M1/2 + m}/2 


from which the following result ensues. 


k 
lak- T" lla S2 ( ) leo — z" |lanve, 


Theorem 4.2-1. For A a positive-definite symmetric matrix, for any ini- 
tial guess zo, the conjugate gradient iterates x, converge to the solution 
x* of Ax = b with error rate 


(4.2-1) [£k — T* || 41/2 < 2(sin(Al/?))* ||z£o — x* || 41/2 


Proof. This follows from the above discussion and the fact that the spec- 
trum o(A!/?) = (o(A))!/2 by the spectral mapping theorem. Recall that 
for selfadjoint T one knows that sin T = (Mr —mr)/(Mr+mr) O 

Example. We reverse the example of the previous section. Let A be 
any real symmetric, positive definite N x N matrix with largest eigenvalue 
Amax = M = 1 and smallest eigenvalue Amin = m = 1/16. Then the 
steepest descent algorithm has convergence rate governed by the sin A error 
bound of Theorem 4.1-1, namely 


Ea(@e41) < 0.77854 E4 (zx), 


whereas the increased efficiency of the conjugate gradient algorithm is gov- 
erned by the sin A!/? bound of Theorem 4.2-1, namely 


E4(£k+1) S 0.36 E4 (zk). 


Let us now look at some numerical computations on simple examples. 
Example. Let 
1 0 
a= fo af 
for which we know cos A = 0.9428090416 and the angle ¢(A) = 19.47122063° 


Let us solve Ax = b, 
1 
= (1): 


first by steepest descent and then by conjugate gradient. The solution 


“= (os) 


will be sought from initial guess 


= (2). 
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0 
0.6667 
0.8889 
0.963 
0.9877 
0.9959 
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T 


0 

0.6667 
0.4444 
0.5185 
0.4938 
0.5021 


Ea (Ln) 


0.08333 
0.00926 
0.00103 
0.00011 
0.00001 


H1(Zn) 


0.94868 
0.94868 
0.94653 
0.94868 
0.94842 


Here are the results for steepest descent, to five iterations: 


(Ln) 


18.435 
18.435 
18.820 
18.435 
18.482 


Here p1(2n) = (Arn, Ln)/||ALn||||2n|]| measures the cosine of the angle of 
each iteration. This leads us to remark that the error E'4(x,,) decreases in 
very near correspondence to its sin A trigonometric convergence rate esti- 
mate. The angle of each iteration is close to ¢(A). Such behavior has been 
observed previously in steepest descent, but here it is seen trignometrically, 
i.e., in terms of the numerical range trignometric theory. 

Using the conjugate gradient algorithm algorithm, we find 


T EA(Tn) HılTn) (Tn) 
0 0 
0.2941 0.5882 0.25692 0.97619 12.529 
1 0.5 3.5e -15 0.94868 18.435 
Next, we consider a less trivial example. 
Example. Let 
20 0 0 0 
0 10 0 0 
A= 0 020 
0 0 0 1 


From sin A = 19/21, we have the angle ¢(a) = 64.7912347°Let us solve 
Ax =b = (1,1,1,1), for which the solution x* = (0.05, 0.1,0.5, 1) is sought 
from initial guess xp = (0,0,0,0). 

Steepest descent converges very slowly. To achieve an error of 107ê, 64 
iterations were required. We leave most of them out, showing only the 
beginning and final values in the following table. 


T Ea (Tn) P(Tn) 

0 0 0 0 

0.1212 0.1212 0.1212 0.1212 0.5826 42.757 
0.0078 0.1043 0.1815 0.1912 0.4464 48.550 
0.1030 0.0994 0.2533 0.2824 0.3464 57.141 
0.0180 0.0999 0.2929 0.3399 0.2710 47.236 
0.0905 0.1000 0.3399 0.4148 0.2133 57.029 
0.0501 0.1000 0.5000 0.9985 1.40e — 06 42.758 
0.4994 0.1000 0.5000 0.9987  1.14e—06 42.793 
0.4994 0.1000 0.5000 0.9987  9.26e—07 42.758 
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Again, we see error decreasing like the sin? A = 0.8186 trigonometric con- 

vergence estimate. However, the angle of each iteration is less than @(A), as 

it must also accommodate critical subangle effects due to higher antieigen- 

vectors other than those composed of just the first and last eigenvectors. 
Using the conjugate gradient algorithm, we find 


T EA (£n) P(Zn) 
0 0 0 0 
0.0594 0.0297 0.0059 0.0030 0.7667 13.526 
0.0498 0.1063 0.0265 0.0133 0.7112 21.526 
0.0500 0.09997 0.5901 0.3062 0.2488 41.676 
0.04995 0.1000 0.5000 1.0000 2.9e — 08 42.745 
0.0500 0.1000 0.5000 1.0000  2.9e— 10 42.758 


Again, we note the departure from the extreme angle ¢(A) to what we may 
call a mixed convergence angle. 


Notes and References for Section 4.2 


The sin(A!/”) conjugate gradient convergence rate was first given by 

K. Gustafson (1994). “Operator Trigonometry,” Linear and Multilinear 
Algebra 37, 139-159. 

The examples above were taken from 

K. Gustafson (1994). “Antieigenvalues,” Linear Algebra Appl. 208/209, 
437-454. 

K. Gustafson (1995). “Matrix Trigonometry,” Linear Algebra Appl. 217, 
117-140. 

The 4 x 4 example is the one we referred to in the previous Section 3.6, 
showing the emergence of combinatorially selected higher antieigenvectors. 

New trigonometric interpretations of many computationally important 
recent variants of the conjugate gradient algorithm, such as GCR, GCR(k), 
GCG, PCG, Orthomin, CGN, GMRES, CGS, BCG, and others, will be 
found in 
K. Gustafson (1996). “Trigonometric Interpretation of Iterative Methods,” 
in Proc. Conf. on Algebraic Multilevel Iteration Methods with Applications, 
(O. Axelsson, ed.), June 13-15, Nijmegen, Netherlands, 23-29. 

It should be remarked that the angle ¢(A) is continuous (for invertible A, 
and in the operator norm ||A||) in A and therefore so are its trigonometric 
functions. Thus, trigonometric criteria for numerical schemes can generally 
be expected to be stable under small perturbations (e.g., computational 
roundoff errors in the elements of A or in iteration matrices related to A). 
In this sense, the fundamental role of the operator angle ¢(A) in iterative 
algorithms reproduces the same virtue as that of the numerical range W (A), 
perturbation stability. 
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4.3 Discrete Stability 


In Section 3.4, we briefly mentioned the abstract theory of initial-value 
problems 


du(t) _ 
(4.3-1) dqt T/U  t>0, 
u(0) = uo. 


If A is an m-dissipative operator, then the solution u(t) is generally given by 
the semigroup u(t) = e4*u for uo in the domain of A. Important instances 
of this theory occur in ordinary differential equations, partial differential 
equations, quantum mechanics, and various engineering disciplines. 

When one wishes a numerical solution of such problems, one must ap- 
proximate the continuous problem (4.3-1) by a discrete one. By this we 
mean that instead of evolving forward over continuous time t > 0, the 
solution will evolve forward in discrete time steps At. Also, the operator 
A must be approximated in a finite way to enable concrete computation. 
This is accomplished in the finite-difference method by replacing A by a 
discrete version defined on a specified finite grid of points z;. 

A classic example is the heat equation 


ðu Ou 
(4.3-2) at Ox?’ 
u(O) = f(x) given. 
A standard easy discretization is the forward Euler (in time), central dif- 
ference (in space) approximation: 
Ou _, u(z,t + At) — u(x,t) 


ot At i 
(4.3-3) Ou _ u(x + Arx, t) — 2u(z,t) + u(x — Ar yt) 
ar (Ar) 


Given appropriate boundary conditions to uniquely determine the solution, 
the solution is then calculated by propagating forward over the discrete 
(Az, At) lattice in space-time. Labeling the x points on this lattice x; = 
tAz and the t points t; = jAt, and calling the discrete system solution 
Ui; = U(a;,t;), the propagation algorithm becomes 

At 
(Ax)? 
This scheme is usually called the Euler explicit scheme. 

There are better schemes than the Euler scheme, but it is conceptually 
simple and illustrates the basic ideas, so it will serve us here. We will ignore 
boundary-value considerations and just look at the numerical solutions to 
such a discretized initial-value problem. These solutions evolve forward in 
time as a sequence of matrix iterations. 


(4.3-4) U; ijtl = = U; ij + 7m [Ui — 2U; j + Ui—1,;]- 


4.3 Discrete Stability 89 


For example, we may write the explicit Euler scheme (4.3-4) more simply 
as 


(4.3-5) UIt! = C(At)U!, 
where 
(4.3-6) C(At) = I + (At)Ap 


and where Ap denotes the discrete centered difference approximation to 
?u/ðzx? in (4.3-3). C(At) plays the role of the semigroup for the discrete 
problem, just as U (At) = e44* was the semigroup for the continuous prob- 
lem. Note that C(At) is the linear (first two terms) approximation to the 
formal Taylor series expansion of e4?4*. If we look at 


C(At)C(At)u = (I + 2AtAp + (At)? A} )u 
= C(2At)u + (At)? Aĵ u, 
we can see indeed that to first order C(At) is a semigroup. 
Recalling from (3.4-2) that one recovers the infinitesimal generator from 


the semigroup by the strong limit of the difference quotient as t — 0, which 
for the heat equation would be 


E 8u , (eA _ I) 
(4.3-7) Au = a2 = s— lim, — aa 

for all u in D(A), and noting that from (4.3-6) we may write 

= C(A(t)) -I u 

7 At 

we may expect that if we have done a good job of discretizing the original 
problem, then 


(4.3-9) \(Ap — A)u(t)|| 0 as At—0 


(4.3-8) Apu 


for all solutions u(t) of (4.3-1) for all good initial data uo(z). This is called 
the consistency condition for a discretization: that on all true solutions the 
discretized operator approximates the continuous one with truncation error 
going to zero as the discretization tends to the continuum. Most natural 
discretizations have this property, although often it is not easy to prove it. 

Given the consistency or assumed consistency of the discrete equation, 
one then comes to one of the most important concepts in the numerical 
analysis of linear initial-value problems, the Laz equivalence theorem: the 
discrete solutions converge to the continuous solution iff the scheme is sta- 
ble. In one sense, this is a numerical version of the Banach theorem: T is 
onto iff (T*)~+ is bounded. However, in numerical as well as the analytical 
settings, you must define your “stability” carefully, and then prove that the 
Banach or Lax “theorems” actually hold for the operators and stability of 
interest. 
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Stability comes in many versions, but in our present setting it means that 
the successive iterates of the operators C(AT) of the scheme are uniformly 
bounded, 


(4.3-10) I(C(At))”|| S M 
for all small 0 < At S 7, on the interval of interest 0 < nAt < T, n = 
1,2,3,... . Otherwise, we might fear that the discrete solution would 


blow up in finite time as the grid size goes to zero. On the other hand, 
merely restricting the iteration matrices to be bounded in (4.3-10) may at 
first seem too weak to guarantee convergence. However, the consistency 
condition (4.3-9) is strong, and with additional assumptions of smoothness 
of solutions, one can usually use specific properties of the discretized scheme 
to bound what are called local discretization errors and then, using those 
combined with (4.3-10), prove the convergence of the method. Conversely, 
by use of the uniform boundedness principle, one can show that convergence 
of all discrete solutions Up to u implies (4.3-10). 

We may now reconnect our discussion to the numerical range. There are 
essentially two (related) connections within the present context. The first 
is the power-boundedness theorem (Theorem 2.1-1), whereby we know that 
if the numerical radius w(C) < 1, then the same is true for all powers, i.e., 
w(C™) < 1. Since the numerical radius is equivalent to the operator norm, 
a stability such as that desired in (4.3-10) can be demonstrated. This con- 
nection and potential application of numerical range to numerical stability 
led to the considerable theoretical research interest in power-bounded op- 
erator families in the 1960s, and thus to the mapping theorems of Chapter 
2. 

The second connection to the numerical range in such applications comes 
from going to the energy norm. For our purposes, we shall regard this as 
going from the operator norm ||A|| to the numerical range norm w(A). In 
applications it often comes down to an appropriate integration by parts. It 
would take us too far afield here to fully elaborate this connection, but let 
us outline it for one important instance, that of the Von Neumann-Kreiss 
stability theorem. 

By substituting Fourier series for UJt! and U in the iteration (4.3-5), 
one converts the problem to 


(4.3-11) VIt1(k) = G(At, k)V5(k), 


where V(k) denotes the Fourier coefficients, k being the vector of frequen- 
cies within the Fourier expansion. The matrices G(At,k) are called the 
(Von Neumann) amplification matrices, and for stability of the original 
scheme, a uniform bound 


(4.3-12) I(G(At,k))"|| £ K 


is desired, for all 0 < At < r, 0 S nAt < T, and all k. The stability 
theorem is the following. 
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Theorem 4.3-1 (Discrete boundedness). Equivalently: 
(1) A family of square matrices {A} is uniformly power bounded 


||A”|| < M 
for some M, all n = 1,2,3,..., and all A in {A}. 
(2) The family is uniformly resolvent bounded 
I7- A S K(\z| - 1) 


for some K, all |z| > 1, and all A in {A}. 
(3) The vector space V on which the family {A} operates can be uni- 
formly transformed to W, w = Tav, on which the transformed A has norm 


ITAAT3+'I| S1, 


where the transforms T4 are uniform in the sense that T4T4 = Ha, a 
Hermitian positive definite matrix satisfying 


A*HsAS Ha 
and 

C~! SHa SC 
for some C, all H4, and all A in {A}. 


Proof. See (Richtmyer and Morton (1967); see Notes). The equivalence of 
the uniform power-boundedness (1) and the uniform resolvent-boundedness 
(2) can be reasonably expected from faith in the functional calculus for such 
operator families. The renorming (3) can be seen in terms of the numerical 
range if we imagine only the case T = T* = H}/?, for then from A*HA < H 
we have 


H-1? A* H? H12 AH- +2 < 1 


or ||H'!/2AH~1/2|/2 < 1. This is nothing more than the fact that the nu- 
merical radius and the operator norm are the same for selfadjoint operators 
T*T. O 

Theorem 4.3-1 is important because it can then be shown that for a 
number of finite-difference schemes, stability becomes 


U7" | S (1+ O(At))||U? |x, 
which is satisfied for (4.3-6) if 
(4.3-13) |C(At) lle [1+ O(At). 


This puts the burden of proof for stability on showing that Ap is O(1) 
in (4.3-6). For Apu the finite difference of the one-dimensional Laplacian 
—A,u = —0°u/0x? in the heat equation example (4.3-2), that implies that 
we should consider the energy norm (Apu,u) = (gradu, gradu) for (the 
square of) our H-norm, for any operator is O(1) in its own operator norm. 
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In this way, the discrete boundedness theorem (Theorem 4.3-1) can be 
seen as yet another instance of the benefits in going from strong to weak 
formulation in the theory of partial differential equations. 


Notes and References for Section 4.8 


For discrete initial-value problems, see 

R. D. Richtmyer and K. Morton (1967). Difference Methods for Initial 
Value Problems, Wiley, New York. 

As pointed out there, when amplification matrices are normal operators, 
then the spectral, numerical and operator radii are all equal, and we can 
conclude stability by knowing that all eigenvalues are contained within 
the closed unit disk. Generally, the amplification matrices should not be 
expected to be normal operators when treating complicated systems of 
partial differential equations and specified boundary conditions, although 
it sometimes does happen. 

The discrete boundedness theorem (Theorem 4.3-1) was first shown by 
H. O. Kreiss (1962). “Uber die Stabilitatsdefinition fiir Differenzengleichun- 
gen die partielle Differentialgleichungen approximieren,” BIT 2, 153-181. 
When A is indeed normal and with spectrum o(A) within the unit disk, 
then A — zI and (A — zJ)~! are also normal, and by Theorem 1.4-3, for 
|z| > 1, 


I(4 — 21)" = d(z, 0(A)) 2 |z|- 1 


so that condition (2) of Theorem 4.3-1 holds with constant K = 1. Such 
considerations can be taken as motivation for the larger class of normal-like 
operators and growth conditions, in which K may be larger, to be treated 
in Chapter 6. 

A return to the power-boundedness of the N x N matrices in Theorem 
4.3-1 and in particular to the question of the best M and K in (1) and (2), 
respectively, and their relationship, may be found in 
R. J. Le Veque and L. N. Trefethen (1984). “On the Resolvent Condition 
in the Kreiss Matrix Theorem,” BIT 24, 585-591. 

M.N. Spijker (1991). “On a Conjecture by Le Veque and Trefethen Related 
to the Kreiss Matrix Theorem,” BIT 31, 551-555. 

For purposes of tighter estimates on the stability constants of finite- 
difference schemes, another variant on the numerical range W(A), called 
the M-numerical range, was introduced in 
H. W. J. Lenferink and M. N. Spijker (1990). “A Generalization of the 
Numerical Range of a Matrix,” Linear Algebra Appl. 140, 251-266. 

This numerical range is based on growth rates of the operator powers ||(A— 
yI)¥\|; see Section 5.5 for further details. The M-numerical range is a 


generalization of the algebraic numerical range of [BD] and nicely brings 
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that abstract theory to bear on concrete questions such as resolvent growth 
estimates for discretization of differential equations. 


4.4 Fluid Dynamics 


Fluid dynamics is a vast field that has greatly influenced the development 
of mathematics and, in particular, numerical analysis, operator theory, and 
matrix methods. The Navier-Stokes equations for the velocities u, 


d 
T = k(u)Au— u-Vu — Vp , t>0, 
(4 .4- 1) diffusion COnvectior. pressure 
gradients 
u(0) = U0; 


may be regarded as an initial-value problem, as in Section 3.4, although 
now with a nonlinear term added, and its discretizations lead to considera- 
tions similar to those of Section 4.3. However, its multidimensionality and 
vector nature, along with its nonlinearity and boundary conditions, have 
led to a whole new field of study, commonly now called computational fluid 
dynamics. 

For our purposes here of illustrating how numerical range considerations, 
chiefly positive definiteness, enter into the numerical analysis of fluid dy- 
namics, let us consider the one-dimensional model problem 


Ut + Vur — kurr =0, O0<zxz<1,t>0, 
(4.4-2) u(z,0) = f(x), given O<2<1, 
u(0,t)=u(l,t)=0, t>0. 


An initial velocity distribution u(x, 0) is to evolve forward in time, subject 
to zero boundary conditions at the ends of the fluid domain 0 < z < 1. 
The reader may prefer to imagine u to be a temperature, as in the heat 
equation of the previous section, rather than a velocity. However, we now 
have a convective term vu, balanced against a diffusion term ku,,. We 
have linearized the problem by taking a constant given positive convection 
speed v and a constant positive viscosity k. 

The finite-difference method proceeds as in the previous section. The 
Euler explicit discretizations (4.3-3) now yield the discrete equation 
Ui j+ — Ui; + vid Vij- _ p | Vizi = Uig + Vis 5 


(4.43) At Ax (Az)? 


= 0. 
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We may write this, analogous to (4.3-4), as 


= tk+vAgr 2k+vAz 
an = [aap 0t [I “ia A] i 


(4.4-4) 4 od Ui41,j- 


It is easily checked that U;,;41 is a convex combination of the three pre- 
ceding values, U;_1,;,Ui,;, Ui+1,j, provided that 


At z 1 


4.4- < —__. 
(4.4-5) (Az)? = 2k+vAz 


This is a typical limitation on time step At one encounters when using 
Euler explicit discretizations, which may be overcome by going to so-called 
implicit schemes, which we will not discuss here. Accepting this time step 
condition, it follows that the U;,; discrete solutions are uniformly bounded 
independent of Az and At. This is the case because, as a convex combina- 
tion of the three nearest discrete values at the preceding time step, Ui, j+1 
is bounded by the largest and smallest of those, which are bounded by the 
largest and smallest of all discrete values at the preceding time step. Thus, 
all are bounded eventually by the extremes of the initial data. This uniform 
boundedness stability, in the sense of (4.3-10), can then, with some more 
work, be used to prove rigorously the convergence of the discrete solutions 
to the true solution as the grids become arbitrarily fine. 

The numerical range enters into many numerical analysis considerations, 
often in a rather elementary way: the positive definiteness of the discretized 
infinitesimal generators Ap. This property was called m-accretive in Sec- 
tion 3.4. The numerical range positive definiteness entered into Theorem 
4.3-1 through the uniform condition C~' < H4 there. Without further be- 
laboring this point, let us just assert that many numerical iterative methods 
need a numerical range with positive real part in order to assure conver- 
gence of the algorithm. 

When encountering a given discretization, it can be instructive to write 
it on the coarsest grid possible in order to see its basic properties. 


Example. Consider the problem (4.4-2) under scheme (4.4-3) on the 
grid h = Ax = 1/3. Disregarding the time derivative for the moment, let 
U, denote u(1/3) and U2 denote u(2/3). Then, the steady problem (time 
derivatives set to zero) yields the 2 x 2 system 


v(U; — U2) k(Up — 2U, + U2) 
= en ee 
v(U2 — U1) k(U, — 2U2 + U3) _ 

h pp 


=0, j=l, 


0, j=2, 
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Letting P = hv/k, multiply by h?/k, from which 


wo E afale g eee] 


Aa = diffusion Ae = convection Boundary data 


Equation (4.4-6) shows the discrete balance of diffusion and convection that 
makes up the discrete infinitesimal generator Ap = Ag+ PA.. Both Ag 
and A, are (real) positive definite: 


aan (x, Aaz) = 2(x} — 2122 + x3) Z ||zx\\?, 
4.4-7 1 
(x, A,r) = r? — T12 + r2 2 5 lzll?, 


and hence Ap is positive definite with lower bound m(Ap) = 1 + P/2. 
This foretells that the discrete time-dependent problem 


du k v 

(4.4-8) dt = (-5 Ad + h A) u 

corresponding to (4.4-2) will exponentiate well, i.e., that the problem is well 
posed with unique solution. The number P is a discrete Reynolds number 
for the problem, often called the Peclet number. If we let the basic given 
convective velocity v become negative, when the Peclet number P = —2, 
the lower bound m(Ap) passes through zero and we can expect numerous 
numerical difficulties. For the coarse h = 1/3 grid of this example, that is 
when the convective velocity is —6 times the basic diffusion constant k. In 
this instance, a numerical remedy is simply to “downwind,” i.e., difference 
the convection term in the opposite direction. This creates what is then 
called an implicit scheme. 

Next, let us briefly consider the finite-element method. We will see sim- 
ilar numerical range considerations, notably positive definiteness, enter in 
a critical way. Assuming zero boundary conditions, the partial differential 
equation of (4.4-2) may be multiplied by an arbitrary “test function” ¢ and 
integrated over the interval, from which, by an integration by parts, 


(4.4-9) [wor fl utk f UrỌz = 0. 


Let us assume a solution of the separated form 


N 
(4.4-10) u(x,t) = X ¢(t)¢:(z), 


i=0 
where the ¢; are some desirable linearly independent basis set. Substituting 


(4.4-10) into (4.4-9) and taking ¢ to be ¢),... ,@y in turn yields the so- 
called Galerkin N x N ordinary differential equation system of the form 


(4.4-11) MpC'(t) + ApC(t) = 0. 
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Assuming Mp to be invertible, this is now the discrete initial-value problem 


dC (t) — _ys-l 
(4.4-12) -a Mp ADC); 


C(0) = Co, 


where co is the vector of initial coefficients from the initial-value represen- 
tation u(xz,0) = DA c; (0); (2). 

The following coarse discretization illustrates this method. 

Example. Let N = 2, and let ġı(x) be the piecewise linear “hat” 
function which rises with slope 1/h from x = 0 to x = h, then descends 
with slope —1/h from x = h to x = 2h, and is 0 for z between 2h and 
3h = 1. Similarly, define ¢2(z) to be 0 from z = 0 to x = h, rising to 1 at 
2h, then down to 0 at x = 3h = 1. 

In this discretization, the Galerkin finite-element system (4.4-11) be- 
comes 

4 bibr fo dads | k | . f 4 pii Sy Pp | 

fo titz fo $262] le So Gib2 ho #202 
h [4 #4 o al e _ pi 
Jo tita Jo 9202]] Lee 0 


Doing the integrations yields 


(4.4-13) 


(4.4-14) 


Live ais) [2] * pl eela ailal Ll 


mass matrix convection diffusion 


Note that the mass matrix Mp is positive definite and that we arrive at 
the same positive definite diffusion matrix Ag as before. The convection 
matrix A, is no longer positive definite but has zero (real) numerical range. 
We remark that, by Theorem 2.4-1, the spectrum o(M, 'Ap) is real and 
positive. Thus, we expect that (4.4-14) will exponentiate well. 

Finally, let us look at the equations of gas dynamics. There, in order to 
simplify, one assumes no viscosity k in the Navier-Stokes equations (4.4- 
1), but on the other hand one must allow a variable density. From these 
assumptions, one arrives at the important Euler system of three first-order 
partial differential equations: 


p pu 0 
(4.4-15) pu | + | pu2+tp |] =|0 
e), (e+ p)u 0 


T 


These are an example of a hyperbolic system of conservation laws. Here 
p is the density of the gas, m = pu is the mass, u is the velocity, p is the 
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pressure, e = pe+pu7/2 is the total energy per unit volume, € is the internal 
energy per unit mass, and only one space dimension z is considered. 

Using an assumed additional equation of state p = (y—1)pe, where y > 1 
is a known gas constant, and calculating the Jacobian of the system (4.4-15) 
to convert it to a quasilinear form, one arrives at the matrix system 


p 
(4.4-16) m 
e Ji 
0 1 0 p 

+ (y — 3)u?/2 (3 — y)u y—1 m 

(T —1)u%—yeu/p ye/p—3(y— 1)u?/2 Tu ej, 

0 
= | 0 

0 


The matrix A of (4.4-16) is an example of the nonnormal matrices one may 
encounter in the numerical analysis of gas dynamics. A simpler matrix may 
be obtained by writing (4.4-16) in terms of the primitive variables p, u, and 
p, from which one arrives at the system 


p u p 0 p 0 
(4.4-17) uj +10 u 1/p u|=]|0 
P/, (0 YY u p 0 


Of course, the matrix A of (4.4-17) depends on the unknown flow quantities 
themselves and hence will vary with them, but a qualitative theory may be 
obtained for such systems. 

For example, one finds that the eigenvalues of A are u, u + c, and u — c, 
where c? = yp/p represents an acoustic propagation speed. From the corre- 
sponding eigenvectors, one may diagonalize A. But the theory is essentially 
incomplete for higher space dimensions. Therefore, the brunt of investiga- 
tion has been computational. One such early scheme, the Lax—Wendroff 
scheme, will be examined in the next section. 

It is interesting to ask what kind of operator the matrix A of (4.4-17) 
is, for the flow variables assumed constant at some designated time. A 
straightforward computation yields 


1 0 c? 
(4.4-18) AA*- AÆA=P|0 pPt-1l-é 0 
c? 0 cît _ po 4 


For fun, let us examine its eigenvalues. From the scalar equation det|AA* — 
A* A — àI] = 0, we arrive at 


(4.4-19) [A? + (074 — c* — 1)A- p74] - [(9 4 — c$ — 1) - A = 0. 
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Let b denote the term (p74 — ct — 1). Then, the eigenvalues are seen to be 


—b + (b? + 4074)? 


(4.4-20) h=b > 


This shows the critical role of the evolving density p relative to the acous- 
tic speed c = (yp/p)/? in determining the signs of the eigenvalues. For 
example, when b Æ 0, neither A or A* is even hyponormal (see Chapter 6 
for these operator classes). When b = 0, one has the interesting reduction 
that the eigenvalues À = 0, +p7?, depend only on the density. 


Notes and References for Section 4.4 


There are a great many references to fluid dynamics and its numerical 
analysis, but Richtmyer and Morton (1967) as cited in the preceding section 
remains an excellent starting treatment. 

One may view discretization of partial differential equations as three ap- 
proaches, the finite-difference, finite-element, and finite-spectral Methods, 
see 
K. Gustafson (1987). Partial Differential Equations, 2nd Edition, Wiley, 
New York. 

In the third editions of that book 

K. Gustafson (1991, 1992). Applied Partial Differential Equations I, I, 
Kaigai Publishers, Tokyo, Japan (in Japanese). 

K. Gustafson (1993). Introduction to Partial Differential Equations and 
Hilbert Space Methods, 3rd Edition, International Journal Services, Cal- 
cutta, India. 

one will find treatment of hyperbolic systems of partial differential equa- 
tions and computational gas dynamics. 


4.5 Lax—Wendroff Scheme 


If one takes a conservation law 
(4.5-1) ut + (F(u)), =0 


such as (4.4-15) and differentiates it with respect to t, one gets an expression 
for uzz, namely 


(4.5-2) Utt = — Fiz = ~Let = —(Aut)z = (AF; )z, 


where A = A(u) is the Jacobian matrix of partial derivatives of F with 
respect to u. We may then substitute (4.5-1) and (4.5-2) into the Taylor 
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series 
At)? 
Ui j+ = Ui; + At(U:) i, + ( 2) (Utt)ij +... 
(4.5-3) 
~ (At)? 
© U; j — At(F(U)z)i j + 5 (AF (uz) x), 


where we have gone from small u to big U to signal a discretization in 
progress. Assuming that A is constant, and using centered differences for 
the x derivatives, this gives the Lax—-Wendroff second-order scheme 


At 
Ui,j41 = Vig — aR Aliti — Ui-1,3) 
(4.5-4) (At)? > 
+ Ac)? A (Ui41,5 — 2U; j + Ui—1,;). 


This is the basic Lar- Wendroff scheme, which provided a significant ad- 
vance in the numerical treatment of gas dynamics. For the system (4.4- 
17), for example, the scheme (4.5-4) is stable provided that At < Az(|u| + 
c)~1, a fact rather easily obtained by the method of amplification matrix 
discussed in Section 4.3. Although the Lax—Wendroff scheme has been 
somewhat superseded by a number of newer schemes, it remains a basic 
workhorse scheme and its results a point of comparison for other schemes. 

In this section, we will demonstrate the stability of a Lax—Wendroff 
scheme as an instance of the power inequality for the numerical range, 
Theorem 2.1-1. 

Consider the first-order linear hyperbolic system in two space dimensions: 


(4.5-5) [s =Au,+ Bu, —oo < z,y < 0, 


u(z,y,0) = f(x,y), 
where u = (u(x, y, t), u2(z, y, t)) is the unknown vector, A and B are given 
selfadjoint matrices, and uz, Uy, ue denote the usual partial derivatives. Let 


Az > 0, Ay > 0 and At > 0 be the increments in a finite-difference scheme 
with fixed ratios 


At At 
À = — d = —. 
Ar S P Ay 
Taking the grid points to be 
(Zj, Yk tm) = (GAz, kAy, mAt), m=0,1,2,..., j,k =0,+1,+2,..., 


and denoting u(;, Yk, tm) by uj, as described earlier, we may consider the 


Taylor expansion about (£j, yx, tm) 


m m T 1 m 
un = Usk + At(us) i, + 9 (At)? (use) Fko 
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from which, as earlier, we arrive at the Lax—Wendroff discretization 


1 1 
Uik = Ue + 5 MAA T Ujak) + 5 HBUkt ~ Yje) 


1 1 m 
+5 AA (UR — 207% + Uae) + 5 u B? (ure 
(4.5-6) — 2V; k + Vj k—1) 


1 m 
tg AL(AB + BA)(v0}41,k+1 7 Vj+1,k-1 7 Vj-1,k+1 


+ v7 1 K-11): 


As discussed in Section 4.3, given that the scheme (4.5-6) is consistent, 
the question of its convergence reduces to that of its stability. To answer 
this question, we introduce the amplification matrix, obtained as in Section 
4.3 by Fourier transform of the difference scheme: 


1 , , 1 , , 
T = T(E, UE A, H) =I+ 9 AA (e*s — eS) + J uB(e” — e *7) 
1 , . 1 . , 
(4.5-7) + 7 A? A? (e8 —2 + eS) + 5 u? B?(e"” — 2+ e7 7) 


+ : \u( AB + BA)(eiE+ — eil€-m) _ eil-8) 4 e7i(E+1)), 


We recall that the scheme (4.5-6) will be stable if there exists a norm || || 
and a fixed constant K > 0 such that 


I(T < K, m=1,2,3,.... -n LEIST, TININ. 
Since all norms on M,, are equivalent, we can use the numerical radius 
w(T). 
Theorem 4.5-1. The Lax—Wendroff scheme (4.5-6) is stable if 
(4.5-8a) Xw (A) + pw (B) < 
or, equivalently, if the time step 


1 [w?(A) w?(B) 1/2 
(4.5-8b) at < 5 | | | 


Proof. Using the simplifications 
e —e % = Jisin€, eS _24e-% = 2cosé — 2, 
and 
ei(Etn) _ eiln) _ elta) 4 eitn) = 2c08(€ +n) — 2c0s(E — n) 


= 4sin€sinn 
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we can write T as R+iJ, where the selfadjoint operators R and J are 
given by 

R=I-C 
where 
1 
C = (1—cos€)\? A? + (1 — cos n)u? B? + 5 Apsin€sinn(AB + BA) 
and 
J = Asin £A + psinnB. 
We now prove that w(T) < 1. If so, then w(T”) < 1 for all n by the 
numerical range power inequality of Chapter 2, and stability of the scheme 
(4.5-6) will have been shown. 
To that end, we compute (Tz, 2x). For x € C”, ||x|| = 1, 
(Tz, x)|* = |(Ra, z)? + |(Jz, x)|? 
(4.5-9) = |((I—C)z, x)|? + (Jz, x)? 
=1+ (Cz, £Y — 2(Cz,x) + (Jz, x). 
We will prove that 
(4.5-10) (Jz, x)*-2(Cz, x) < —(1—cos €)*A?|| Az||?—(1—cos 7)? || Ba||?n7, 


which with (4.5-9) implies that 
(4.5-11) 
(Tz, 2)|* < 1+ (Cz, x)? — (1 — cos€)*d*||Az||* — (1 — cos)? || Ba”. 


Then, we will prove that 
(4.5-12) (Cx, x)? S (1 — cos€)*d*||Aa||? + (1 — cosn)*p?||Ba||?, 
which with (4.5-11) implies that 
(4.5-13) \(Tx,2)|? <1. 
Turning first to (4.5-10), we see that 
2C — J? = 2(1 — cos £)A? A? + 2(1 — cosy) pu? B? 
+ sin ésin nàu( AB + BA) 
— [A? sin? €A? + u? sin? nB? + Asin é sinn(AB + BA)| 
= \*(1 — cos £) A? — ?(1 — cos n)? B?, 
and hence by the Schwarz inequality, 


(Ja,x)* —2(Cz, x) < ((J? — 2C)z, x£) 


= —(1 — cosé)?X||Aa|? 
— (1 — cos n)?y2|| Ball?. 
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Having established (4.5-10), we turn next to (4.5-12). To that end, we 
compute 


(Cx, x) = (1 — cos€)*||Az'|* + (1 — cos) y?|| Bar|? 
(4.5-14) 1 
+ 5 Ap sin Esin u((AB + BA)z,z). 


Bounding the last term as 
1 
5 [Agsin € sinn((AB + BA)z, x)| 


< |Ausin € sin n(Az, Bz)| 


< Ap|sin € sin ||| Az|| || Bz | 


(4.5-15) 1 
< 5[\? sin? £l Az]? + p? sin? || Bz? 


1 
51A°(1 — cos” £)|| Axl]? + u? (1 — cos” n)|| Bal”) 


< (1 —cos€)X7||Az||* + (1 — cosn)u"|| Ball’, 


since we always have, for any 9, 


1 1 
5 (1 — cos? €) = 5 (1 + cos 8)(1 — cos @) < 1 — cos8@, 


from (4.5-14) and (4.5-15), we thus have 
(Cx, 2)| < 2(1 — cos £)A?||Az||? + 2(1 — cos m)u? || Bal|?. 
By the Schwarz inequality, 
(Cx, x)? < 4[(1 — cos €)?d7||Ax|l? + (1 — cos n)? u’ ||Bz lP]? || Az||? 
+ p*||Ba|l?]. 
Since A and B are selfadjoint, we have 
||Az|| < |A] = w(A) and ||Bz|| < w(B). 


Therefore, using the condition \?w?(A) + °w? (B) < 4 of Theorem 4.5-1, 
we have the desired (4.5-12). As noted earlier, this implies w(T) < 1 and 
w(T”) < 1 by the power inequality. O 


Notes and References for Section 4.5 


The Lax-Wendroff scheme was first introduced in 

P. Lax and B. Wendroff (1960). “Systems of Conservation Laws,” Comm. 
Pure Appl. Math. 13, 217-237. 

The numerical range appears implicitly in their stability condition 


(u, Gu)| $ (1 + O(At))||ul|” 
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in 

P. Lax and B. Wendroff (1964). “Difference Schemes for Hyperbolic Equa- 
tions with High Order of Accuracy,” Comm. Pure Appl. Math. 17, 381- 
398. 

Note that this is condition (4.3-13) although stated for the amplification 
matrix G rather than the scheme matrix C. Theorem 4.5-1 was shown in 
M. Goldberg and E. Tadmor (1982). “On the Numerical Radius and its 
Applications,” Linear Alg. Appl. 42, 263-284. 


4.6 Pseudo Eigenvalues 


The notion of pseudo eigenvalues has been recently introduced into numer- 
ical analysis to overcome a rather fundamental limitation encountered in 
actual computation: the sensitivity of eigenvalues to small perturbations. 
Many results in numerical theory are stated in terms of and depend rather 
precisely on the eigenvalues of operators. For selfadjoint and normal op- 
erators, or near normal operators, the eigenvalues are often insensitive to 
small perturbations. However, as one treats far from normal operators, 
extreme sensitivity can occur. 
Example. Consider the left shift on n-space 


0 1 0 
0 1 
0 1 
(4.6-1) A= 
1 
0 0 


If one perturbs by B the n x n matrix of all zeros except its extreme 
lower left corner element bn1, which is taken to be e with e small, then the 
eigenvalues of A (an n-fold 0) become the nth roots of —e. Thus, the single 
point spectrum o(A) = 0 under perturbation has changed to n points on 
the circle of radius «!/N. For example, for n = 64 and € = 0.1, the spectrum 
o(A + B) lies on the circle of radius 0.9646616199. 

The fact that the numerical range is stable under additive perturbations, 
whereas the spectrum is not, 1.e., 


o(A+ B) £ o(A)+0(B), 
W(A+B)c W(A)+ W(B), 
is one of the virtues of using the numerical range in perturbation contexts, 
especially for general operators or matrices for which the convex hull of the 


spectrum is significantly different from the numerical range. Between the 
spectrum o(A) and the augmented numerical range 


(4.6-2) W.(A) = W(A) + A, 
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where A, denotes the closed disk of radius c€, is the pseudo-spectrum o,(A), 
which is defined to be all that are the eigenvalues of some perturbed 
matrix A+ E with ||E|| < e. 


Theorem 4.6-1. Equivalently, for a given nx n matrix A, for given e = 0, 
(i) A is an €-pseudo-eigenvalue of A, 
(ii) ||(AZ — A)a|| < € for some ||zx|| = 1, 
(iii) |[AZ — A)~*||"* Se, 
(iv) the smallest singular value of AI — A is < e. 


Proof. We prove (i) => (ii) => (iii) > (iv). If A is an e-pseudo-eigenvalue 
of A, then there is some perturbation E for which (A+ E)x = Az, ||x|| = 1. 
Thus ||(AJ — A)z|| = ||Ez|| < e. As |[(AZ — A)~1]|~? is the infimum of 
(AZ — A)z|| over all ||x|| = 1, (iii) follows immediately from (ii): here 
we have assumed that A is not in o(A). Similarly, ||(AJ — A)~*||~? is the 
smallest singular value o, of AJ — A, by the spectral mapping theorem. 
Finally, given the singular value decomposition 


(4.6-3) AM —A=ULV* =o ojujož. 
j=1 
we may take E = 0,unv7, so that 
n—i 
M—-A-E=)_ ojujvš, 
j=l 


which, being singular, means that À is an eigenvalue of A+ FE. O 

The insensitivity of pseudo-spectra to perturbations, and the inclusion of 
the pseudo-spectra within the augmented numerical range, are perhaps the 
two basic properties of the pseudo-spectrum o,(A), aside from the easily 
checked fact that it is closed. 


Theorem 4.6-2 (Pseudo-spectra stability). Let D be a perturbation of 
norm 6. Then 


(4.6-4) Cel A+ D) Co, (A) C esl A +D), 


(4.6-5) o.(A) C W,(A). 


Proof. To demonstrate (4.6-4), given À an eigenvalue of some A+ E with 
|Z || < €, then åA is an eigenvalue of A + D+ (E — D) and |E — D|| < e+ ô. 
If 6 > €, ge—6 is taken to be the null set. As to (4.6-5), from (A+ E)x = Ax 
and ||xz|| = 1, we know that 


(4.6-6) (Az, x) —Al=||/Exc|] Se. O 
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In the preceding three sections of this chapter, we showed connections 
between the numerical analysis methods of finite differences and finite el- 
ements and numerical range considerations. The third basic numerical 
analysis method for partial differential equations is that of finite spectral 
methods. This method, which relies on the ability to do fast computa- 
tional integrals or transforms, offers very high resolution in practice but 
considerable difficulty in other ways, such as boundary values, stability, and 
proofs of convergence. One motivation for developing a theory of pseudo- 
eigenvalues was to help with the stability of spectral methods. Thus the 
considerations of this section may be regarded as extensions and variations 
on Theorem 4.3-1 by replacing the needed estimates with respect to the 
unit disk |z| < 1 by an arbitrary stability region. 


Notes and References for Section 4.6 


The first definition of e-pseudo-eigenvalues was apparently by 

J. M. Varah (1979). “On the Separation of Two Matrices,” SIAM J. Nu- 
mer. Anal. 16, 216-222. 

where they were called “e-eigenvalues.” Considerable work more recently 
began in 

L. N. Trefethen (1990). “Approximation Theory and Numerical Linear 
Algebra,” in Algorithms for Approzimation II, eds. J. Mason and M. Cox, 
Chapman, London, 336-360. 

where they were originally called “approximate eigenvalues.” An applica- 
tion to numerical methods of spectral type may be found in 

S. Reddy and L. N. Trefethen (1992). “Stability and the Method of Lines,” 
Numer. Math. 62, 235—267. 

where it is shown that a necessary and sufficient condition for stability is 
that the e-pseudo-spectrum of the spatial discretization operator lie within 
a distance O(e) + O(k) of the stability region as € and the time step k tend 
to zero. 

It has been known for a long time by experienced practitioners in nu- 
merical modeling in which the discretizations involve nonnormal operators 
that one must go beyond the spectrum o(A) to understand stability. See, 
for example, the discussion in 
P. J. Schmid, D. S. Henningson, M. Khorrami, and M. R. Malik (1993). 
“A Study of Eigenvalue Sensitivity for Hydrodynamic Stability Operators,” 
Theoret. Comput. Fluid Dynamics 4, 227-240. 
where the sensitivity properties of the eigenvalues are related to transient 
properties of the flow. The introduction of the notion of pseudo eigenval- 
ues is a mathematical way of trying to create a general theory for such 
instabilities, see 
L. N. Trefethen, A. E. Trefethen, S. Reddy, T. Driscoll (1993). “Hydrody- 
namic Stability without Eigenvalues,” Science 261, 578-584. 
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where it is shown that very small perturbations to even very smooth (Cou- 
ette) flow can produce error amplification of O(10°) by a linear mechanism, 
even though all eigenmodes decay monotonically. 

An interesting predecessor paper to pseudo eigenvalues is 
S. Parter (1962). “Stability, Convergence, and Pseudo-Stability of Finite- 
Difference Equations for an Over-Determined Problem,” Numerische Math- 
ematik 4, 277-292. 
As we mentioned in Section 4.5, the Lax-Wendroff scheme is generally 
stable so long as 


At 


The mathematical meaning of this is, roughly, that the scheme should not 
advance faster than its characteristics. The physical meaning is, roughly, 
that we want to prevent the simulated Euler equations (4.4-15) from devel- 
oping shocks. In the paper just mentioned, the easier transport equation 


ut + a(x, tug =g 


is considered along with initial and boundary conditions, and it is shown 
that for the Lax-Wendroff scheme the At for stability must satisfy 


At 
_ < 
a <1 


but that there is an interval of “pseudo-stability” 


At 
1 SOR, < v 4/3 


in which the scheme is generally unstable and nonconvergent, but nonethe- 
less single errors are eventually damped out. 

Clearly (see Theorem 4.3-1), the notion of e-pseudo-spectra is related to 
resolvent operators possessing certain estimates in terms of the distance of 
A to the spectrum. Recall that for any operator T we know by (1.4-6) and 
Schwarz’s inequality that 


(4.6-7) d(A,W(T)) S I(T -A)T S dQ, o(7)). 


This is the basic framework within which the theory and application of 
e-pseudo-spectra lie. It is also the framework of several of the classes of 
operators that we will study in Chapter 6. 

We remark that the notion and results for ¢-pseudo-eigenvalues for ma- 
trices could be extended to arbitrary operators, e.g., by replacing the point 
spectra 0,(A) by the full spectrum o(T) and the numerical range W (A) by 
its closure W(A), but we shall not investigate such generalizations here. 
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Certainly the strongest argument for bringing numerical range and numer- 
ical analysis closer together in the future is the stability of W (T) under 
small perturbations. As we stated in the Notes of Section 4.6, it has been 
long known in scientific circles and the numerical analysis community that 
once one encounters nonnormal operators or nonnormal discretizations of 
physical processes, small errors in data or measurement can produce poten- 
tially highly erroneous simulations. This lore underlies the recent interest 
in pseudo eigenvalues or, put another way, the latter is a recent manifesta- 
tion of the former. 

Indeed, it is quite amusing to select an n x n matrix A at random and 
numerically plot its numerical range W(A): algorithms to do this will be 
explained in the next chapter (Section 5.6). Then change an element or 
two of A slightly and you will see very little movement of W(A). This 
can be in great contrast to what is happening to the spectrum. Thus, if 
one can establish convergence and/or stability criteria in terms of W(A) or 
w(A), rather than in terms of eigenvalues or r(A), one is far more secure 
against incorrect conclusions due to physical data measurement precision 
limitations or machine epsilon occurrences. It should also be stressed that 
even if a physical theory can be given entirely in terms of symmetric or 
normal operators, actual physical measurements will always carry some 
departure or truncation from such theory. 

Another point has recently been emphasized by 
M. Eiermann (1993). “Fields of Values and Iterative Methods,” Linear 
Algebra Appl. 180, 167-197, 
namely, that if A is not normal, only asymptotic behavior of an iterative 
numerical method can be drawn from spectral information. To understand 
the progress of an iteration after a finite number of steps, one can often 
better use the numerical range. A similar viewpoint is taken also in the 
recent papers by 
G. Starke (1993). “Fields of Values and the ADI Method for Non-normal 
Matrices,” Linear Algebra Appl. 180, 199-218. 

O. Axelsson, H. Lu and B. Polman (1994). “On the Numerical Radius of 
Matrices and its Application to Iterative Solution Methods,” Linear and 
Multilinear Algebra 37, 225-238. 

Let us comment that even though the terminologies “e-eigenvalues” and 
“approximate eigenvalues” gave way to the term “e-pseudo- eigenvalues,” 
it should be noted that the term “pseudo-eigenvalue” appears in the theory 
of spectral concentration, where an unstable eigenvalue of a Hamiltonian 
operator has been converted by a perturbation to a point of steep spectral 
slope in the continuous spectrum. Thus it could turn out that the original 
terminology ¢-eigenvalue and e-spectra were not inappropriate. 

The recently discovered fundamental connection between the operator 
trigonometry of Chapter 3 and iterative methods for the solution of large 
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sparse systems Az = b, goes beyond steepest descent and conjugate gradi- 
ent algorithms as described in Sections 4.1 and 4.2, respectively. For some 
of the recent developments, see 

K. Gustafson (1996) “Operator Trigonometry of Iterative Methods,” Nu- 
mer. Lin. Alg. Applic., to appear. 

K. Gustafson (1996). “Operator Trigonometry of the Model Problem,” to 
appear. 

For example, the classic Richardson iteration 7,4; = £k + a(b — Azk) 
with iteration matrix Gg = I — aA is optimized at convergence rate popt = 
sin A. As another example, Chebyshev polynomial preconditionings may 
be seen to be optimized at midwidth (A)/center(A) = sin A. Similarly, 
Jacobi, SOR, and SSOR achieve optimal convergence rates described by 
sin A, sin?/ 2 AY 2 and sin A!/?, respectively, where the operators A are 
defined in terms of upper and lower bounds for A relative to A’s diagonal 
or upper or lower triangular parts. 

Moreover, the observed superlinear convergence rate of conjugate gradi- 
ent methods can be explained as follows. The sin (A!/”) convergence rate 
of (4.2-1) can be expressed as 


y/2 _ 1/2 y, \ 1/2 
. Al/2 — “max min ~1—2 1 
sin (A’/*) 2 4 \i/? (> A 


min 


In gradient descent the error converges to the subspace V = sp {z, £n}. 
However, the conjugate gradient error moves toward V+. On V+ the op- 
erator A has reduced spectrum Ag < --- < An-1. Thus the conjugate 
gradient error rate improves to 


2, B yi? X 
1 2 
sin ((Aly.) /2 )= = EVE ~ 1-2 (2) . 

In computational partial differential equations, a key model problem for 
testing iterative solvers is the Dirichlet Problem on the unit square. More- 
over, often in applications it is precisely this computation which consumes 
the bulk of the computing time. For expositions of the properties of im- 
portant iterative solvers on this model problem, see the example 
D. M. Young (1971). Iterative Solution of Large Linear Systems, Academic 
Press, New York. 

W. Hackbusch (1994). Iterative Solution of Large Sparse Systems of Equa- 
tions, Springer-Verlag, Berlin. 

For this model problem, with discretized Laplacian A; and finite differ- 
ence grid size h, it is now known that sin A, = cos(rh) and that the first 
antieigenvalue of A, is u = sin (rh). Thus the maximum turning angle 
$ (An) increases with decreasing grid size h. Moreover, the full operator 
trigonometry of A, may be seen to depend upon a set of related “harmonic” 
grids. 


z 


Finite Dimensions 


Introduction 


The theory of numerical range in finite-dimensional spaces is very rich and 
varied. In fact, a lot of recent research has been focused on the numeri- 
cal range, and its variations, in finite dimensions. Avoiding the evidently 
impossible task of doing justice to all of the work done in this field, we 
attempt to present a representative selection and hope that it covers all 
the basic material. 

Consistent with the rest of the book, we use the usual inner product in 
C”. Thus, we have for x,y € C”, x = (£1, 22,---, Ln), Y = (Y1, Ya,---s Yn)» 


(T, y) = 2191 + T2Y2 +--+ +H EnYn- 


If A is an operator represented by an n x n matrix A = (a,;), then ||A|| = 
sup{||Az||, 2 € C”, n x n matrices acting in C”. 

In this chapter, we study some basic properties of the field of values and 
its localization. Further, we look at two special topics that have received 
considerable attention, namely Hadamard products and special (not clas- 
sical) numerical ranges. We also indicate how one may compute W(A) for 
matrices A and give some examples. 


5.1. Value Field 


We already know that the numerical range is convex (Theorem 1.1-2). In 
addition, we have the following result. 


Theorem 5.1-1. The numerical range of any matrix A acting in C” is 
compact and the numerical radius attained. 


Proof. The function z — (Tz, x) from the compact set S = {zx : ||z|| = 1} 
into C is continuous. Further, the real-valued function x — |(T'xz,2x)| from 
S to R attains its maximum value. Ll 


K. E. Gustafson et al., Numerical Range 
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We note that the spectral inclusion is more transparent in finite di- 
mensions since for any A € o(T), we have Ax = Tr, |\z|| = 1, and 
(Az,z) = (Tz,z). Furthermore, we have the following inclusion theorem 
for the numerical ranges of submatrices. 


Theorem 5.1-2 (Submatrix inclusion). Let A = (a:;;) E€ Mn. Let A(J) 
denote the submatrix having the elements of A in the rows and columns 
given by an index set J C {1,2,...,n}. Then W(A(J)) C W(A). 


Proof. Let J = {j1,j2,..., jk}, 1 < jı < j2 <- < jp < n. Let z e CF, 
|x|| = 1. By inserting zeros in the right places, we can obtain a vector y € 
C” such that ||y|| = 1. The vector y looks like (...0,2;,,0,...,2j,,0,..., 
25,,0,...). Then (T(J)z,z) = (Ty,y) € W(T). O 

In the case of a symmetric matrix, the special inclusion has an additional 
feature. 


Theorem 5.1-3. The numerical range of a symmetric matrix A is the real 
interval |m, M], where m and M are the least and greatest eigenvalues of 
A, respectively. 


Proof. We know that W(A) is a compact convex set on the real line. Let 
W(A) = [m, M]. Since W(A) is closed, m, M € o(A) by Theorem 1.2-4. 
By the spectral inclusion, we conclude that m and M are the minimum 
and maximum values of o(A). O 


Theorem 5.1-4. The numerical range of a unitary matrix A is a polygon 
inscribed in the unit circle. 


Proof. Let Az = Az, ||Az||? = |z|? = |A|?||z||?. Thus À is on the unit 
circle. Since A is normal, W(A) = coo(A) by Theorem 1.4-4, and we have 
W(A) = co (Aj, A2g,---,An); |A;| = 1, 1 = 1,2,...,n. O 

Let us now look at some applications of the numerical range in matrix 
theory. In particular, we prove a Schur triangularization and a spectral 
decomposition of a normal matrix. 


Theorem 5.1-5. For each T € Mo, there is a unitary matrix U € Mo 
such that the two main diagonal entries of U*TU are equal. 


Proof. Consider B = T — ¿4 (tr A)I. Then trB = 0 = the sum of the 
eigenvalues of B. If A and —A are the eigenvalues of B, we have A € W(B) 
and —’ € W(B) by spectral inclusion, A_A = 0 is a convex combination of 
two elements of W (B), and hence 0 € W(B). So, there is a vector v € C?, 
v = (v1, U2), ||v|| = 1, such that (Bu, v) = 0. 
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Define the unitary operator U by the matrix 
vı —U2 
v w | 


U* BU = f zl, 
b c 


a,b,c, € C?, and c = 0 since tr B =0. O 


Then, we can see that 


Theorem 5.1-6. For each T € Mn, there is a unitary matrix Un € Mn 
such that all the main diagonal elements of U*TU are equal. 


Proof. We use induction with Theorem 5.1-5 as the starting point. Let 
B=T- 1 (trT)I. Then tr B = 0. If 1, à2,..., An are the eigenvalues 
(possibly repeated) of B, we have $`} Ax = trB = 0. Since for each 
k = 1,2,...,n, Ax E W(B), we have 1 Ay +--+ An = 0 € W(B) by 
the convexity of W(B). Hence, there is a unit vector v € C” such that 
(Bv,v) =0. Let V, be a unitary matrix with v as the first column. Then 
V% BV has the matrix 

0 a 

b Da) 


where a = (az2,...,@n), b = (b2,b3,...,bn)", a,b € C”-}, and Dn-1 € 
Mnrn-1. 

By the induction hypothesis, there exists a unitary matrix Vn—ı E€ Mn-1 
such that Vž_;Dn-1Vn-1 has all the main diagonal elements equal. Extend 
the unitary matrix Vn—ı to a unitary matrix W, by 

1 0 
Wn = F man | 
Then Un = V,,W,, is the desired unitary matrix. O 

We will now use the numerical range to develop a sequence of results 
leading to the spectral decomposition of a normal operator. 

Let us first recall that (Theorems 1.4-1 to 1.4-5) when the matrix A is 
normal, r(A) = w(A) = ||Al], W(A) = coo(A), and the extreme points 
are eigenvalues if W(A) is closed. We then immediately have the following 
theorem. 


Theorem 5.1-7. The extreme points of W(A) are eigenvalues of a normal 
matrix A. 


Proof. W(A) is closed (Theorem 5.1-1). Now use Theorem 1.4-5. O 
The boundary of the numerical range of a normal A has special relations 
to its spectrum, as the following theorems reveal. 
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Theorem 5.1-8. Let A be a normal matrix. Then 


(5.1-1) OW(A)No(A) £9. 


Proof. W(A) is a compact convex set and hence is the convex hull of its 
extreme points (a nonempty set). These extreme points are eigenvalues 
(Theorem 5.1-7). O 

The last result, apparently trivial, leads us to the spectral decomposition 
of a normal matrix. We start with an eigenvalue on the boundary and the 
corresponding eigenspace and then proceed to repeat the procedure on the 
complement. The following theorem reveals the significance of such an 
eigenvalue. 


Theorem 5.1-9. Let A be a matrix, not necessarily normal, and a an 
eigenvalue on the boundary of the numerical range. Then, the dimension 
of the eigenspace M for a is equal to its algebraic multiplicity, and A is 
unitarily equivalent to aly © B, where a ¢ o(B). 


Proof. Let x be an eigenvector corresponding to a, ax = Ax. We can, by 
use of a unitary transformation, choose z as (1,0,0,...) and Az = az = 
(a,0,0,...). Evidently, 


(5.1-2) A= o Al 


Now let us choose a principal 2 x 2 submatrix B of A, given by the indices 
1 and å, i € [2,... n], 
& c 
sofe e] 


B has eigenvalues a and 8, a € W (B) by spectral inclusion, and W (B) c 
W(A) by Theorem 5.1-2. Since a € W (B), W (B) is an ellipse and a is 
a focus on the boundary of the ellipse, we must have W (B) = [a, 8]. This 
implies by Theorem 1.4-1 that B is normal. The condition B*B = BB* 
now implies c = 0. Thus, we see that 


a 0 
A= . 
If a has multiplicity greater than 1, we can repeat the process until we 
obtain a unitarily equivalent matrix 


al k 0 
0 Aky’ 
where k is the multiplicity of œ. Notice that the k-dimensional eigenspace 
corresponding to a is a reducing subspace of A. O 
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Now, let us assume that A is normal. Then A,+1 is normal on an (n—k)- 
dimensional space, and for any b € o(A), b 4 a, we have b € o(Ax41). We 
now use Theorem 5.1-8 and conclude that b € OW(Ax41). We can now 
repeat the argument in Theorem 5.1-9 and obtain the following Corollary. 


Corollary 5.1-10. If the operator A is normal, it can be decomposed in 
the form 


A = A 9 A2 9- O Ae, 
where each A; is normal and o(A;) C OW(A;) for i = 1,2,...,£. 


Proof. Similar to that of Theorem 5.1-9. O 


Corollary 5.1-11. If Ar = az, Ay = ày, a # à, and a € OW (A), then 
(z, y) = 0. 


Proof. From (5.1-1) we have À € o(A;) and so the first component of A is 
zero. So (x,y) =0. U 
The last fact can be used to deduce the converse of Corollary 5.1-10. 


Theorem 5.1-12. If A is the direct sum A = A; @ Ao ®--- @ Ag, where 
each A; has the property o(A;) C OW(A;), then A is normal. 


Proof. Each A; is normal, because each A; has a complete set of orthogonal 
eigenvectors, by Corollary 5.1-11. Hence A is normal. U 


Notes and References for Section 5.1 


Most of the elementary properties of W(A) have already been studied in 
Chapter 1 and apply in the finite-dimensional case. The eigenvalues ap- 
pearing in Theorem 5.1-9 and the Corollary 5.1-10 appear in 

C. R. Johnson (1976). “Normality and the Numerical Range,” Lin. Alg. 
Appl. 15, 89-94. 

Such eigenvalues are called normal eigenvalues. 


5.2 Gersgorin Sets 
The numerical range of a sum of matrices can be located, at least roughly, 
since 

W(A+ B) CW(A)+ W(B) = {A+ p,r E W(A), wp € W(B)}. 


Such a relation is not true for the spectra. So, one way to localize o(A+ B) 
is the inclusion o(A + B) c W(A + B) c W(A)+W(B). This is perhaps 
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quite an important motivation in obtaining sets containing W(T). Such set 
containment was first obtained by Gersgorin (see Notes) for the spectrum. 
Since the results we mention use proofs similar to those of Gersgorin, it is 
convenient to look at his proof. 


Theorem 5.2-1. Let A = [a;;] E€ Mn. Define R; = isi |a;;| and 
D(A) = {2 : |z —au| < Ri}, 1 <i <n. Then o(A) c Uy D;(A) = D(A). 


Proof. Let À € o(A), Ax = Az, x £0. If x is the n-tuple (11, £2,..., £n), 


choose k so that |z,| = max{|z,|,|z2|,...,|an|}. We then have Ax, = 
> -1 akii and so 
(5.2-1) ALk — AkkTk = ` AkiTi. 

i¢k 


Hence |A — ark||£k| < drive larillzi] < Yop 4p lenillzx]- Thus |A — axe| < 
D;(A) 

Observe that we can repeat the above process with the columns of A 
instead of the rows and obtain 


(5.2-2) o(A) c Ù D:(A) = D'(A), 
1 


where 
D;(A) = X |Agl- 
t#j 
Thus o(A) C D;(A)N D;(A). Notice that D/(A) = D,;(A*). O 
The following theorem gives a Gersgorin-type set containing the numer- 
ical range. 


Theorem 5.2-2. Let A = [a;;| E€ Mn. With the notation of the previous 
theorem, W(A) is contained in the set 


S = co 


(Jiz: 2lz — ail < Dj(A) + Dj(A) 
1 


Proof. Consider any line of support of the convex set S. We will show 
that W(A) is also on the same side of this line of support. Without loss 
of generality, we may assume that S is contained in the right half-plane 
Rez > 0 and show that W(A) is contained therein. By Theorem 5.2-1 


: _ A+A* _ 
applied to Re A = 3 = B, we have 


(5.2-3) o(B) C co Ue : 2|z — ail < Di(A + A*)} 
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Notice that, by the triangle inequality, 
D;(A) | D;(A* 

52 pp) < BAA) 4 DA) 
Since S C {z : Rez > 0}, we have Rea = bi > Di D(A) += 
the last set 2 D:tA) + 2A DA) | is contained in {z : Rez > 0}, so is D,(B). Hence 

o(B) C {z: Rez > ô}. As B is selfadjoint, W (B) = coo(B) and hence 
W(B) c {z : Rez 20}. 

Since A = B+ 45 A”, we have W(A) c w(B)+wh*) A”) and wit *) A’) 
is purely imaginary. "Hence W (A) is contained in the set fz: Rez > Oo}. oO 


D'(A , 
DA) ) and since 


Corollary 5.2-3. The numerical radius satisfies 
(5.2-5) 


w(A) < max ( p PA) + ON = m p (Sp Belle lil + loz: 


2 2 


Proof. The right-hand side of this inequality is the maximum absolute 
value of the elements in S. O 

Another method of obtaining a Gersgorin set containing W(A) makes 
use of the following interesting fact. If we subtract the diagonal from any 
matrix, we have a matrix with zero trace. Thus, if R = [rij] and 


Til 


RP=R- T22 


Tnn 


the sum of the eigenvalues of R? is zero. If, in addition, R? is selfadjoint, 
we can conclude that some of the eigenvalues are negative and the others 
positive (unless all of them are zeros). Then, the eigenvalues of R? lie 
between R?, and R%,, the minimum and maximum eigenvalues of R°. The 
following theorem applies this observation to the real and imaginary parts 
of any given matrix. 


Theorem 5.2-4. LetA = [aij] E Mn, A= R+iS. Then 


UB. 


1 


W(A) cC co 


? 


where 


(5.2-6) B;={2z: R? < Re (z — ai) < R? SP, < Im (z — a;i) < See}. 


Proof. Let f E€ C”, f = (fi, fo,---, fn), II Fl = 1. Then y= (Af, f) ~~ 
S7 alfil?, we have Rey = (R°f, f), and Imy = (S°f, f). Obviously, 
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R}, < Rey < R§, and S}, < Imy < S9,. Since (Af, f) differs from y by 
some element of the convex hull of the a;;, the result follows. O 


Corollary 5.2-5. W(A) C {z : |Rez| < P,|Imz| < o}, where P = 
max; >), |r}| and ø = max; >>, |594]. 


Proof. We have, using Theorem 5.2-1 and the notation of Theorem 5.2-4, 
IRQ | <P and |RI<P, |S2|<o, |SS |<. O 


Example 1. Let us first look at a perturbation of a shift in which all 
the diagonal elements are zeros: 


A= 


nee CO © © 
ooo KF 
oor © 
oro © 


By Theorem 5.2-2, 
D,(A) = D(A) = D3(A)=1 and D,(A) = 
Di(A) = 5, Dg(A) = DSA) = D4(A) = 1, 


W(A) c fz: |z| <1} 


On the other hand, the minimum and maximum eigenvalues of R? and S® 
are, respectively, (—0.89, 0.89) and (—0.89, 0.89). Thus by Theorem 5.2-4, 
W (A) is contained in the rectangle 


{—0.89 < Rez < 0.89, —0.89 < Imz < 0.89}. 


The actual numerical range is shown in Fig. 1 of Section 5.6. 


Example 2. When the diagonal elements are not all equal, we get a 
union of circles to bound W (A). Consider 


1 2 0 
A=|0 2 2 
00 3 
By Theorem 5.2-2, 
D,(A) = Do(A) =2, D3(A) =0, 


D'(A)=0, D(A) = D§(A) =2. 


Thus W(A) is contained in the convex hull of the union of the circles 
{z:|z-—1| < 1}, {z: |z-—2| < 2}, and {z : |z — 3| < 1}. On the other 
hand, using Theorem 5.2-4, W(A) is contained in the convex hull of the 
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three rectangles, Rı, R2, and Rs: 
Ry = {z : —1.414 < Re (z — 1) < 1.414; —1.414 < Im (2 — 1) < 1.414} 
Rə = {z : —1.414 < Re (z — 2) < 1.414; —1.414 < Im (z — 2) < 1.414} 
and 
R; = {z : —1.414 < Re (z — 3) < —1.414; —1.414 < Im (z — 3) < 1.414}. 
Figure 2 in Section 5.6 shows W (A) in this case. 


Example 3. Let us now look at the companion matrix of the polynomial 
A3 — 3A? + 4\ — 2 (see the next section), 


0 1 0 
A={0 01 
2 —4 3 


By Theorem 5.2-2, 
D,(A)=1, De2(A)=1, D3(A)=6ô, 


D(A) = 2, D2(A) =ð, D3(A) = 1, 


and hence W(A) C co{C; U C2 U C3}, where Cy = {z: |z| < 3}, C2 = 
{z : |z| < 3} and C3 = {z : |z — 3| < 2}. Theorem 5.2-4 gives W(A) C 
co { R1, R2}, where 


R, = {z : —2.06 < Rez < 4.57; —2.74 < Imz < 2.74}, 
Rə = {z: —2.06 < Re (z — 3) < 4.57; —2.74 < Imz < 2.74}. 
W (A) is given in Fig. 3 of Section 5.6. 


Notes and References for Section 5.2 


The original Gersgorin result was given in 
S. Gersgorin (1931). “Uber die Abrenzung der Eigenwerte einer Matrix,” 
Izv. Akad. Nauk SSSR (Ser. Mat.) 7, 749-754. 

An early paper connecting the numerical range to the Gersgorin theory 
was 
F. Bauer (1968). “Fields of Values and Gershgorin Disks,” Numerische 
Math. 12, 91-95. 

A Gersgorin set for the numerical range in Theorem 5.2-2 was first given 
by 
C. R. Johnson (1973). “A Gersgorin inclusion set for the field of values of 
a finite matrix,” Proc. Amer. Math. Soc. 41, 57-60. 
It provides a fair approximation and makes equal use of the rows and 
columns. 
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For any operator T = R+21S, we have 
W(T) Cc W(R) +iW(S). 


Thus, we always have the result that W(T) is contained in the rectangle 
determined by the minimum and maximum eigenvalues of R and S. This 
result is known as the Bendizrson—Hirsch theorem. 

Theorem 5.2-4 is a modification of this theorem obtained by deleting the 
diagonal. Further details can be seen in 
A. I. Mees and D. P. Atherton (1979). “Domains Containing the Field of 
Values of a Matrix,” Lin. Alg. Appl. 26, 289-296. 

Other Gersgorin sets for the spectrum and the numerical range can be 
found in 
V. N. Solov’ev (1983). “A Generalization of Gersgorin’s Theorem,” Izv. 
Akad. Nauk. SSSR Ser. Mat. 47, 1285-1302; English translation in Math. 
USSR Izv. 23 (1984), 
A. A. Abdurakmanov (1988). “The Geometry of the Hausdorff Domain 
in Localization Problems for the Spectrum of Arbitrary Matrices,” Math. 
USSR Sbornik. 59, 39-51. 
An excellent and easy-to-read account of Gersgorin sets for the spectrum 
is given in 
R. A. Brualdi and S. Mellendorf (1993). “Regions in the Complex Plane 
Containing the Eigenvalues of a Matrix,” Amer. Math. Monthly 101, 
975-985. 


5.3 Radius Estimates 


Upper bounds on the numerical radius, although elusive in the general case, 
are more available in some special cases. We consider, in particular, the 
cases of 0-1 matrices and companion matrices. 

A square matrix all of whose elements are 0 or 1 is called a 0-1 matriz. 


Example. 
000100 
000000 
A= 1 00 0 0 0 
0010 0 0 
010 0 0 0 
000 0 1 0 


Let us consider the general case in which A is a 0-1 matrix with at most one 
1 in each row and each column. We can associate, in a unique fashion, an 
injection o, o : x > {1,2,...,n}, where z C {1,2,...,n} and A = A(o), 
where A(c);,; = z(j € 2)6;,o(j). So, the i,j component of A = A(ø) is 1 
only when j € x and o(j) = 1. Otherwise, it is zero. 
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Let us, before going on, illustrate the injection o in the case of the matrix 
A given above. In column one, the 1 appears in row 3, so o(1) = 3, and in 
column three the 1 appears in column 4, so o(3) = 4 and o(4) = 1. Thus, 
we complete a cycle (1,3,4). Similarly, 0(2) = 5, o(5) = 6, and there is no 
1 in column 6. Also there is no 1 in the second row. Thus, we terminate 
a chain (open circuit) (2,5,6] of length 3. Notice that the cycle (1,3,4) and 
the chain are disjoint. 

Returning to our general discussion, we can thus write ø as the composi- 
tion 0 * 7, where @ is the cycle and 7 the chain. Thus o = 0 * rT, A(o) =A. 
If ¢ € Sn the symmetric group of n, we can show that A(¢o) = A(¢)A(c) 
and A(oġ) = A(a)A(¢). In fact, A(ġo) = z(j € x,o(j) = k,o(k) = i) = 
x(j € x and do(j) =i) = A(py)i j, where 


O = 01 * 02 * -x OK *O, *- ++ * Og, 


where the os are chains, observing that there may be degenerate cases 
(singletons). Let a;, 3; denote the lengths of o; and @; and 


t 
w= doa, 1<t<k, (0) =0, 
t=1 


and yt = Yk + yt Bi,p+1<t<n. It can be shown (see Marcus and 
Shure (1979), Notes and References at the end of this section) that there 
is a @ E€ Sn such that simultaneously 


prip! =6;, i=1,2,...,k, ¢(0;)=9;,, i=1,2,...,2, 


where Ot = (Yt-1 + 1,Yt-1 + 2, e. Yt)» t= 1,2, e. , k, and 0: = (Yp+t-1 + 
1,...,Tpts], 1 < s < £. Notice that the cumbersome notation is needed to 
have the elements of successive cycles and chains from left to right in the 
order 1,2,...,n. 

We can now in principle calculate W (A) by writing A as a direct sum 
operator using two kinds of matrices. Let Pm be the m x m permutation 
matrix that has 1 in the subdiagonal and 1 in the top right-hand corner in 
the position (1, m) 


0 0 1 

1 00 
P,=|9 1 0 

0 0 1 0 


Let us observe that w( Pm) = 1, because we may obtain 


sup{|m21 +2122 +: + 2m—12m), |z? +--+ |em|? =1} =1 
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by taking |z;| = Wit i = 1,2,...,m. Also, |Pm|| = 1. Let Sm denote the 
m-shift 


0 0 

1 0 0 

0 1 0 

0 0 
0 0 1 0 


Recall that w(Sm) = cos —4, (Section 1.3). Notice that the õ;, and ô; 


correspond to Pm and Qm. We can now write 
(5.3-1) A = Pa, ® Py, D- O Pa, DQ D- OQR. 


In principle, W (A) can now be calculated as the convex hull of this direct 
sum. 

Notice that w(P;) = 1 for all i € {a1,...,a,} and w(Q;) = cos zf; 
i € {81,...,8e}. To summarize, we may state the following theorem. 


Theorem 5.3-1 (Graph radius). Let A be a square n x n 0-1 matrix with 
at most one 1 in each row and column. Then w(A) = 1 if the incidence 
graph G(A) possesses a nontrivial cycle; otherwise w( A) = cos(7/(8 + 1)), 
where 2 is the number of vertices in the longest chain in the graph. 


Proof. This follows from the preceding discussion. Observe that the above 
computation does not necessarily depend on every row or column having 
at most one 1. The essential condition is that o can be broken into cycles 
and chains. O 

Figure 4 of Section 5.6 shows W(A) for the 6 x 6 0-1 matrix A given 
above. Let us consider some other examples. 


Example. Let S be the nine-dimensional right shift Sg 


0000 00 0 0 0 
100000 0 0 0 
01000 0 0 0 0 
00100 0 0 0 0 
S=;0 00 10 0 0 0 Of, 
000010 0 0 0 
000001 0 0 0 
000000 1 0 0 
00000 0 0 1 0 
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and 
000000000 
00000000 0 
0000000 0 0 
10000000 0 
T=S+S'=|/0 10000000 
00100000 0 
00010000 0 
100000000 
010001000 


We discussed these operators in Section 2.5 (actually, their adjoints, but it 
makes no difference). If x = (x1, 22,..-,29) E€ C?, we have 


(5.3-2) (Tx, £) = £1£4 + £225 +2376 + Lal7 + T5Tg + LeTy + T219. 


To make explicit the role of the permutations mentioned earlier, in calculat- 
ing the numerical radius, let us calculate w(T) by using a permutation. Let 
us write y = (y1, Y2,- --, Y9), © = (L1, 22,---,L9) = (Y7, Y4, Yrs Y8, Y5, Y2, Y9, 
Ye, Y3), where the permutation is obvious. Then (Tz, s} = $ ;—4 YiYi+ı. 
Hence w(T) = cos fp (see Section 1.3). 

Figure 5 of Section 5.6 shows the full numerical range W (T) for this 9x9 
matrix T. 

Let us now form the matrix product 


00 0 0 0 0 0 0 0 
00 0 0 0 0 0 0 0 
0000 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 
ST=St+S8=]|1 0 0 0 0 0 0 0 0| =TS. 

010 00 0 0 0 0 
0010 0 0 0 0 0 
00010 0 0 0 0 
1 0 0 0 1 0 0 0 0 

We have 

(5.3-3) (ST x, £) = £1£5 + T5£9 + LoL, +L2£6 + L3L7 + T4Tg. 

cycle 


We can immediately conclude that w( ST) = 1 because of the presence of 
a cycle. Notice that choosing zı = £5 = £g = Jz: all other z; = 0 will 
also yield the same result. Since ||S|| = 1, we have w(ST) > w(T)||S]|, as 
observed in Section 2.5. 
The actual numerical range W (T S) may be seen in Fig. 6 of Section 5.6. 
We have already seen two Gersgorin-type estimates for the numerical 
range W(A) in Section 5.2. In addition, we can use the following estimate 
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for the numerical radius w(A). Let 


1/2 1/2 


(5.3-4) b= | > lay] +(X lazil? 

j+i j#t 
Denote by A; the ith projection of A obtained by deleting the ith row 
and the ith column of A. Generally, let c; denote w(A;). An estimate for 
w(A) (see Abdurakmanov (1988), Notes and References) is given in terms 
of these c; by 


y .. — d.|2 + 212 
(5.3-5) uann f alt low? 


where d; is any arbitrary constant so that c; < dj. 
Let us illustrate the estimate (5.3-5) by a companion matrix example. 
The companion matriz of a polynomial A” + pià”! +--- + pn is given by 


0 1 0 0 0 
0 0 10>- 0 
(5.3-6) A= . = [ai]. 
0 0 0 0 1 
—Dn —Pn—1 oe e — pı 


In particular, A,, is rather simple for the companion matrix (5.3-6). In this 
case, the (n — 1) x (n — 1) matrix A, = Sn-1, the (n — 1)-dimensional 
shift, and we know that w(A,) = cos =. We can simplify (5.3-5) further by 
choosing d; = c;. Since |ann| = |pi|, we get an estimate 


pil + cos® | [(|pi| — cos 4)? + b2]? 
g < PUT Sn p AA Sn n 
(5.3-7) w(A) < PHE 4 ; 

Example. For the polynomial A? — 3A? + 4A — 2, we have 

0 1 0 1/2 
20 1 
A=|0 0 1]; P| =3, 6, = FE 

2 —4 3 


from which (5.3-7) becomes 


2 


We know that w(A) & 3.77 (see Fig. 3 in Section 5.6). 

Clearly, one needs to calculate the two other bounds in (5.3-5), and 
optimize in d;, to get the best bound by this method. Carrying out the 
former, we note (e.g., by numerical calculation; see Section 5.6) that c2 = 
w(A2) & 3.30 and cı = w(Ai) S 3.62, and again taking d2 = c2 and 
dı = c1, (5.3-5) yields the estimates w22 = 4.697 and w1; = 4.161. 


w(A) S = 4.85. 
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An excellent exposition for the calculation of the numerical range of 0-1 
matrices is given in 
M. Marcus and B. N. Shure (1979). “The Numerical Range of Certain 0, 
1-Matrices,” Lin. and Multilin. Alg. 7, 111-120. 
The article contains additional results and a method of plotting the bound- 
ary of the numerical range. An extension of this result, and the example 
Sg given earlier, were taken from 
K. R. Davidson and J. A. R. Holbrook (1988). “Numerical Radii of Zero- 
One Matrices,” Michigan Math. J. 35, 261-267. 

The estimate (5.3-5) is given in 
A. A. Abdurakmanov (1988). “The Geometry of the Hausdorff Domain 
in Localization Problems for the Spectrum of Arbitrary Matrices,” Math. 
USSR Sbornik. 59, 39-51. 
See also 
C. Johnson (1974). “Gershgorin Sets and the Field of Values,” J. Math. 
Anal. Appl. 45, 416-419. 
for the earlier numerical radius estimate 


1 
(5.3-8) w(A) S$ 3 ™ ue Jag; + 2 Jaji] 


Mappings of the numerical radius can be found in 
C. R. Johnson, I. M. Spitkovsky and S. Gottlieb (1994). “Inequalities 
Involving the Numerical Radius,” Linear and Multilinear Algebra 37, 13- 
24. 
There it is shown that if f and g are polynomials with real coefficients and 
A is a 2 x 2 matrix with tr A and det A real, then 


w(f(A)g(A)) < w(f(A))w(g(A)). 


Further, the following unsolved cases for 2 x 2 matrices are pointed out: 
1. A € R?*?, f and g have complex coefficients; 
2. A € C?*?, f(z) = z and g(z) = 2°; 
3. AE R™", f(z) =2*, g(z) =2™,k AmM2<n<k+m. 


5.4 Hadamard Product 


For any two matrices A and B of the same dimension, the Hadamard 
product is defined as the entrywise product. Thus, if A,B € Mmn, A = 
laij], B = [b;;], their Hadamard product is given by 


(5.4-1) Ao B= [aijbi;] E Mmn.- 
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Evidently, AoB = BoA, Ao(BoC) = (AoB)oC, Ao(B+C) = AoB+AoB 
for any A, B,C E€ Mmn and (aA) o B=a(Ao B) for any scalar a. Notice 
that if Ao B = A, then all the elements of B should be equal to 1. 

In this section, we study the numerical range of the product Ao B, where 
A,B € Mn and one of them, say A, is normal. Many of the properties of 
Ao B can be derived from those of their Kronecker product (5.4-2) defined 
later. When A € Mmn and B € Mp, are two matrices of any dimension, 
A = [a,;], B = [bij], their Kronecker product is defined as the matrix given 
by the blocks 


anBayB > ainB 
(5.4-2) AS®B= zz E Mmp,nq- 
Qn1Ban2B +++ amnB 
If 
23 4 a 
=j? b i and B=|3], 
then, for example, we have 
2a 3a 4a 
28 38 48 
(5.4-3) A®B= 
aa ba ca 
aß bB cb 


The associated and distributive laws can be easily verified. Further, (A ® 
B)' = A' @B', (A@B)* = A* & B*, (aA) & B = AQ (aB) =a(A@B). 
Also, if A & B and C & D permit ordinary matrix multiplication, we can 
verify that (A & B)(C ® D) = AC & BD. If A,B,C,D € Mn, A`! =C, 
B? = D, we can see that (A @ B)(C & D) = AC & BD =I & I. Notice, 
however, that A & B is not necessarily B & A. 

Let us now consider the case when A and B are square matrices, A, B € 
Mn. 


Theorem 5.4-1. Let A, B € Mn. Then 

(a) o(A 8 B) = {rAip;, i,j = 1,2,...,n}, including algebraic multiplici- 
ties. 

(b) If A and B are positive Hermitian, then so is A® B. 


Proof. (a) Let Ax = Az, By = Ay, x # 0, y # 0. Then (A8 B)(x 8y) = 
Az ® By = àuz ® y. Using the Schur triangularization, let U* AU, V* BV 
be upper triangular where U and V are unitary and the eigenvalues of A 
and B form the respective main diagonals 


(U 8 V)*(48 B)(U 8 V) = U* AU & V*V, 


where both the factors are upper triangular. 
(b) A = A* and B = B*. Then (A 8 B)* = A* &8 B*= AQB. O 
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Corollary 5.4-2. If A and B are positive definite matrices in M,, then 
so is Á Q B. 


Proof. A ® B is Hermitian, and now use Theorem 5.4-1. O 


Theorem 5.4-3. Let A € Mn, B € Mpx be two square matrices. Then 
(a) co[W(A)W(B)| c W(A@ B). 
(b) If A is normal, then W (A & B) = co[W(A)W(B]]. 


Proof. (a) Let (Az,z) € W (A) and (By, y) € W(B), |x|] = |ly|| = 1, with 
the corresponding inner products. Then 

(x &y,z 8y) = (z,z){y,y) = 1, 

(Ax, x)(By,y) = (4 8 B)(x @y), x 8y) E W(A 8 B). 


Thus W(A)W(B) c W(A@B). Since W(A® B) is always convex, we have 
co[W(A)W(B)| c W(A@ B). 

(b) Let A be normal and U € M, a unitary matrix such that U* AU is 
the diagonal matrix 


Since the numerical range is unitarily invariant, we have 
W(A®B)=W({U @1)*(A@ B\U &T)] 
= W(U*AU © B)=W(D@B). 


To calculate W(D ® B), let us choose a general vector x @ y, where 


(5.4-4) 


yı 


Tı 
— : — : — T1y nk 
=| , Y : |, and r@y Flee 
Tn Yk 
and let us form the block matrix 
a4 0 wee 0 
0 a2 0 a,B 
D®eB= . B= aoB 
an B 
0 0 Qn 
Then 
(5.4-5) ((D 8 B)(1 8y), Ty) -5 aizii(By, y) 


Let J = {i € (1,2,...,n), zi #0}. Then 


(D 8 B)(1 8y), 18y) = X ziziai(By, y) 
tEJ 
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Since a;(By,y) E€ W(D)W(B) and J;e; |x|? = 1, we have z as a convex 
combination of elements of W(D)W(B), and hence z belongs to co [W (D). 
W(B)]. The result (b) follows since W(D) = W(A). O 

We can now derive some results for the numerical range of the Hadamard 
product (5.4-1) of Ao B by regarding it as a submatrix of A Q B, as 
illustrated by the following example. 

Example. Let 


r a, B= |p bio . 
a21 A22 b21 b22 


aııbı1 @11bı2 a12b11 Q@12b12 
(5.4-6) A@ B= |%b21 411622 abzı a12b22 
a21bı1 4@21b12 a22b11 a22b12 
a21b21 a21ıb22 a22b21 a22b22 


Then 


Observe that the underlined elements form A o B. In general, for A, B € 
Mn, the elements of A o B appear in the columns and rows numbered 
1,n +2, 2n +3,3n + 4,...,n? of AQ B. The following properties of Ao B 
can be deduced immediately from this inclusion. 


Theorem 5.4-4. Let A and B be two commuting n x n matrices. Then 
a) If A and B are Hermitian, so is Ao B. 
b) If A and B are positive semidefinite, so is Ao B. 
c) If A is normal, then w(A o B) < w(A)w(B). 


Proof. Ao B is a submatrix of AQ B and hence by the submatrix inclusion 
(Theorem 5.1-2), W(A o C) c W(A@ B). For (a) observe that W (A @ B) 
is real; for (b) W(A ® B) is contained in the right half-plane; and for (c) 
observe that W (A o B) Cco[W(A)W(B)]. O 


Notes and References for Section 5.4 


The Hadamard product was initially studied by 

J. Schur (1911). “Bemerkungen zur Theorie der Beschränkten Bilinearfor- 
men mit Unendlich vielen Veränderlichen,” J. Reine Angew. Math. 140, 
1-28. 

An excellent account and also an in-depth study of the Hadamard product 
and its applications can be found in 

R. A. Horn (1990). “The Hadamard Product,” Proc. Symposia in Appl. 
Math. 40, 87-120. 

Important results in the theory of Hadamard products can be seen in 

T. Ando, R. A. Horn and C. R. Johnson (1987). “The Singular Values of a 
Hadamard Product: A Basic Inequality,” Lin. Multilin. Alg. 21, 345-365. 
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See also [HJ2]. 


5.5 Generalized Ranges 


The numerical range found many generalizations with different applica- 
tions in view. We will treat one of these generalizations, which has been 
extensively studied recently, mention some more, and then refer the reader 
to further studies in the literature. 


C’-Numerical Range 


One of the generalized ranges that attracted a lot of attention is the C- 
numerical range. For any A € Mn, 


(5.5-1) Wo(A) = {tr(CU* AU), U € Un}, 


where Un is the group of unitary matrices in Mn. Evidently, Wc(A) is a 
unitary invariant and also Wc(A) = Wa(C). Further, Wc(A) = Wg(A) 
iff B is unitarily similar to C. We will first see when Wc (A) is nontrivial, 
then study its numerical radius, and finally look at its convexity properties. 
Since Wc(A) is trivial when C is a scalar, let us first look at some situations 
in which C is a scalar operator. 


Theorem 5.5-1. If C leaves invariant all m-dimensional subspaces of C”, 
1<m<_n, then C is a scalar. 


Proof. Let {e1,€2,...,€n} be an orthonormal basis in C”. Then Ce; = 
Aiei, 7 = 1,2,...,n, as C leaves invariant all one-dimensional subspaces. 


Further C(>>y ei) = u(J0] ex) and hence A; = p, i = 1,2,... n. O 


Theorem 5.5-2. If C commutes with U* AU for all U € U,, then either 
A or C is a scalar. 


Proof. Suppose that A is not a scalar and À an eigenvalue of A with 
corresponding eigenspace M), dim M) < n. For any U e U”, U* AU also 
has A as an eigenvalue and the corresponding eigenspace is U*M). So for 
y E€ U*My, U* AU(Cu) = C(U* AU)y = CU* AUU*z, (for some x € M)) 
= CU*Ar = CAU*x = Cry. So Cy € U*M). Thus C leaves U*M) 
invariant. Since U is arbitrary, C leaves all the m-dimensional subspaces 
invariant. Hence, by Theorem 5.5-1, C is a scalar. O 

Another way in which C (or A) can be a scalar is that Wc(A) can be a 
constant. 


Theorem 5.5-3. If tr(CU*AU) = constant for all U € Un, then C com- 
mutes with U* AU, and hence either Č or A is a scalar. 
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Proof. For any x € R, e*° is unitary for any skew-symmetric matrix 
S. Consider the constant equal to tr [Cu~*° AU®5]. The derivative with 
respect to x at x = Q is 


0 = tr |(CU* AU — U*AUC)S]. 


Every matrix B € Mn can be written as a linear combination of two skew- 
symmetric matrices as 


B=; (B- BY) - 5 E (B+ BY). 


Hence tr [(CU* AU — U* AUC) B] = 0 for all B € Mn. So C commutes with 
U* AU for all U € Un and hence is a scalar (or A is a scalar). O 
The C-numerical radius r¢(A) is given by 


(5.5-2) ro(A) = max{|z| : z € Wc(A)}. 


Comparing it with the numerical radius w(A), which is an equivalent norm, 
we first study the norm properties of r¢(A). To this end, let us recall the 
following definitions. 

A function N : Mn — R is called a seminorm if for all A, B € Mn and 
all œ € C, 


N(A) 2 0, 
(5.5-3) N(aA) = |a|N(A), 
N(A+ B) S N(A) + N(B). 

A seminorm is a generalized matrix norm if it is positive definite or, equiv- 
alently, 

N(A)>0O whenever A#0O. 
A generalized matrix norm is a matrix norm if, for all A,B € Mn, 

N(AB) < N(A)N(B). 


Recalling the properties of W (A), the best that we can expect is that re (A) 
be a generalized matrix norm. 


Theorem 5.5-4. The C-numerical radius rc(A) is a generalized matrix 
norm if and only if C is not a scalar and has a nonzero trace. 


Proof. Let rc(A) = 0. Then tr(CU* AU) = 0 (a constant) for all U € Up. 
Hence, by Theorem 5.5-3, either A or C is a scalar. Suppose that C is not 
a scalar. Then A = pl and rc(A) = |ptrC| = 0. If trC # 0, we have 
u = 0 and hence A = 0. On the other hand, C is a scalar AI implies that 
for any A Æ 0, with tr A = 0, rc(A)) = |Atr A| = 0. Further, if tr C = 0, 
then ro(1) =0. OU 

Before we look at the convexity of Wc(A), let us observe that Wc(A) 
is rather intractable for arbitrary matrices C and A. Most of the known 
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results correspond to C normal or Hermitian. When C is normal, by the 
unitary invariance of Wo(A), we can assume C = diag {c1,C2,...,Cn}, 
where c; € C. Denoting the vector (c1,C2,...,Cn) by c, Wc(A) is then 
usually written as W.(A). An equivalent formulation for W.(A) is the 
following, 


(5.5-4) W.(A) = [S alese {e:i} € a. 


where A,, is the set of orthonormal bases for C”. In fact, this is the original 
formulation of W,(A). Let us first note that even when C and A are normal, 
W..(A) need not be convex. 


Example. In C3, let C = A = diag {0,1,i}. Then c = (0,1,7), and let 
us first show that for any o.n. basis {e1, e€2,e3}, we have 


3 
S| Ci(Aei, ei) = (Aeég, €2) + (Ae, e3). 
l 


Let us choose 


0 0 1 
ex = {0}, &=ļ|l]|, e=10 
1 0 
Then 
0 
Ae» = 1 
0 


and (Aeo,e2) = 1, Ae3 = 0 = (Ae3,e3). Thus 1 € W,(A). Let us now 
choose 


1 0 0 
ex= 10], ex=let|, eg =]0], 
0 0 1 


Ae = ez and (Aez,e2) = i = (Aez,e3). Then 2i € W,(A). However, 
5 +i ¢ W.(A) as shown now. For any 


§ 


T 
e2 = |y and e3= |y], 
z z’ 


let |y|? + |z|? = a? and |y'|? + |z’|? = 82. Then, we have 


3 
X Ci(Aei, Aei) € W(a?B + iß?B), 


i=1 
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where B is the two-dimensional diagonal matrix 


1 0 

wi 
W (&? B) is the segment joining a? and ia”, and W(iG?B) is the segment 
joining iĝ? and —@?. It can be easily verified that 5 +i ¢ a?W(B) + 
i3*W (B), as follows. If it were the case, then for any Za + iYa E€ &?W (B) 
and xg + iyg € iß?W (B) we would have zg + Ya = QÊ, Ya — Ta = 8, 
Ta + T8 = 5, and xg + yg = 1, an incompatible system. 

We will now study the convexity of W.(A) in some special cases. 


Theorem 5.5-5. If A € M2 and a = (a1, a2) € C?, then W,(A) is convex. 


Proof. Let z = a;(Az,,21) + &2(AT2, £2} for any orthonormal basis 
{x1,Z2} of C?. Then 


(5.5-5) z= (ay — a2) ( ja — G ReA) r mna) + . (ay + a9)tr A, 


which is an element of the numerical range of the matrix (a; — a2)A + 
(atr AJI. LJ 

We will now compare the numerical ranges corresponding to two vectors 
a = (a1,Q@2) and 8B = ((1, 82) such that a; = af, + (1 — a)fo, a2 = 
(1 —a)ßı +aß2,0 <a < 1. We say that (a), a2) is obtained from (81, 62) 
by pinching. 
Theorem 5.5-6. If (a1, a2) is obtained by pinching (1, 82), then 


(5.5-6) W.(A) C We(A). 


Proof. As in Theorem 5.5-5, let us write 


Wios ,aa)(4) = (ar ~ a2)W(B) + 5 (a1 + 02)(tr A) 


= (2a — 1)(Gr ~ Ba) W(B) + 5 (Ar + Ba)( tr (A) 
and 


W¢.,92)(A) = (81 — B2)W(B) + 5 (6i + Ba)(tr A), 


where B = A — = (tr A)J. By the elliptic range theorem, (4, — 62)W (B) 
is an ellipse and tr B = 0. Hence (8ı — 62)W(B) is symmetric about the 
origin. Since 0 < a < 1, we have —1 < (2a — 1) < 1, and hence 


(2a -1)W(B)cCW(B). O 


Theorem 5.5-6 can be generalized to two vectors œa, E€ C”, when a 
is obtained from ( by pinching, that is, when two components œi, œj are 
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replaced by af; + (1 — a)@; and (1 — a)G; +aßj, where 0 < a < 1, all other 
components of a remaining unchanged. 


Theorem 5.5-7. If œa is obtained from 8 by pinching, then W,(A) C 
W,(A). Ifa is obtained from ĝ by a finite number of pinchings, then 
Wal A) C W,(A). 


Proof. Let V be the subspace spanned by a;,a; and P the projection on 
V. Then W,(A) and W,(A) consist of 


nr 


` ap (ALK, Tk) + Wia: a) (PA) 


k=1 
kfi j 


and 


S ar(ATr, £k) + Wip, 82) (PA). 
EFi 
However, W(a,,a;)(PA) C Wig,,8,)(PA) and hence 
W.(A) C Wg(A). 


For the second part of the theorem, apply the first part repeatedly. O 

Let us now specialize to the case when a and ĝ are real vectors. Let 3 
be ordered (61 > b2 > --- > Bn) and X; ai = }>;_; bi. Then, we can 
easily see that 


k k 
t=1 i=l 


From (5.5-7) we may obtain an interesting relation between pinching and 
doubly stochastic matrices when a and £8 are real vectors and 8 is ordered. 
Recall that a matrix A € Mn is called doubly stochastic if all of its entries 
are nonnegative and all row sums and column sums are 1. 


Theorem 5.5-8. Let œ, ß be real n-vectors, }\} ai = >>; bi, with B or- 
dered, bı > b2 > --- > Bn. If a is obtained from ß by pinching a finite 
number of times, then there exists a doubly stochastic matrix S such that 


a= SB. 


Proof. Let us first see the effect of one pinching. If y is obtained from 8 
by pinching ĝ;, 8j, we have 

yi = a8; + (1 — a)bj, 

yj = (1 — a)B; + aG;. 
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If 1 < 7, we have 


(5.5-8) y= 


The matrix involved is doubly stochastic and hence a = SB, where S is 
the product of doubly stochastic matrices and is doubly stochastic. L 
Let us now use the above development to study the case of W,(A) when 
a is a real vector a = (Qj,...,Qn) E R”. Let us write for any b,c € R”, 
b < c if there exists a doubly stochastic matrix S such that b = cS. For 
example, in Theorem 5.5-8, we could write a < ĝ, i.e., œ is majorized by 


6. 
Theorem 5.5-9. For A,C € M, and C Hermitian, Wc(A) is convex. 


Proof. Since Wç (A) is a unitary invariant, we may take C as the diagonal 
matrix [c], where C € R”. Let {U*[c|U : U € Un} be denoted by M (ce). 
Then Wc(A) = {tr(Az),z € M(c)} C {tr (Az), x € conv M(c)}. We will 
now make use of a result (see Notes at the end of this section) which, 
with the majorization notation described above shows that conv M (c) = 
{U*[b]U,U € Un,b < c}. We will then have tr (Ax) € W (A) C W.(A) 
by Theorem 5.5-7. Hence W,(A) = {tr (Ax),x € conv M(c)}, which is 
convex. U 

Some other variations on the classical numerical range are the following. 


k-Numerical Range 


Let us note that the k-numerical range W;(A) defined as 


k 


W,(A) = fz z= Ņ (Azti, Ti), where 
i=1 
{£1,..., £k} are k orthonormal vectors in cn} 


is a special case of W,(A), where c € R”. 


F-Numerical Range 
If || || is any norm in C”, the norm induced in its dual space is given by 


llyl|* = sup{ly*z|, 2 € C”, ||z|| = 1}. 
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The Bauer field of values, also known more generally as the spatial numer- 
ical range V(A), is given by 
F(A) = {y* Az, x € C”, |z|] = lly"ll = ly*2| = 1}. 


This generalized numerical range was discussed in Section 1.6. In particu- 
lar, F(A) is not necessarily convex. 


Algebraic Numerical Ranges 


A generalization of F(A) called the algebraic numerical range is obtained 
in the following manner. For any A € Mn, let ||Al| = sup{||Azl|,z € 
C”, ||z|| = 1}. Let (Mz, || ||*) be the dual space given by 


|| B\|* = sup{|tr (BA)| : A € (Mn, || |), All = 1}. 
The algebraic numerical range is taken to be 
7(A) = {tr (BA) : ||B*|| = tr B = 1}. 


It is known (see the References at the end of this section) that 7(A) = 
()\ Diy, p], where D[y, p] is the disk {z : |z — y| < p}, and the intersection 
is over all pairs y, p for which ||A —-yJ|| < p. In general, 7(A) is the convex 
hull of F(A). 


Example. Let the norm chosen be || ||... Then, for any A = [a;;] € Mn, 
let D;(A) denote the Gersgorin disk (see Section 5.2) D;(A) = Dai, piil, 
where pii = $ izp |Qik|. Then 


T(A) = conv (Ù n) . 


For algebraic numerical ranges for the infinite-dimensional case, such as 
Banach algebras, see [BD]. Those are usually denoted as v(A) in those 
contexts. 


M -Numerical Range 


A variation on algebraic numerical ranges motivated by applications in 
stability analysis is given by 


TM (A) = () Diy, ol, 


where the intersection is over all pairs y, p for which ||(A — yI)*|| < Mp*, 
where M > 1 is fixed and k = 1,2,... . Some elementary properties that 
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can be easily verified are 


Tu (A) is compact and convex, 
Tu (al + BA) =a + Bru (A), 
Tu(A) Ctn(A) if 1<N<M, 
o(A) C Ty (A). 


Usually, M > 1 in applications. The motivation for Tyj(A) is to obtain 
containment sets for o(A) tighter than those provided just by W (A). 


Restricted Numerical Ranges 


The subset of the complex numbers consisting of 
Ws(T) = {(Tx, x), Iz] =1,x € S}, 


where S C H is a prescribed set, is called a restricted numerical range. 
The -numerical range Ws(T) is an example of this: S = {zx : ||Tx|| > 6}. 
When S is the whole unit sphere, we have the classical numerical range. 

Let us now consider two properties on S that will guarantee that Ws (T) 
will be convex: 

(i) x € S implies az € S if |a| = 1; 

(ii) if x,y € S, then for every r > 0, either 


—fT 
a Y“. TTY E€ S. 
Iz + ry lz — ryl| 


Theorem 5.5-10. If S satisfies the properties (i) and (ii), then Ws(T) is 
convex. 


Proof. Let x,y € S, ||x|| = ||y|| = 1. Then, for 0 < t < 1, consider 


(T(x + ay), x + ay) 


5.5-9 
(5.5-9) je + ayl 


= t(Tx, £) + (1 -— t) (Ty, y), 

the condition for convexity of S. Simplifying (5.5-9), we get an expression 
of the form |a|? + aa + ba — izt = 0 where a and b are complex numbers 
depending only on T, x,y but not on a. Separating the real and imaginary 
parts and observing that izt > 0, we get the equation of a line passing 
through the origin. Thus there exist two values of a satisfying (5.5-9), 
namely the intersections of the line with the circle. The assumptions on S 
now guarantee that one of these belongs to S. O 


Corollary 5.5-11. W(T) is convex. 
Proof. Notice that {x : ||Tzx|| > 6} = {x : ||VT*T|] > 6} and VT*T is 


selfadjoint. In general, a set S = {x € H, ||x|| = 1, (Tx, z) 2 ô}, in the case 
of T selfadjoint, satisfies conditions (i) and (ii). This is seen as follows. For 
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r > 0, x,y E€ S, we have (T(x try),z+ry) = 1, (Tz,2) + 17r7(Ty,y) + 
2r Re (Tz, y). Thus 


(T(x ry), x ry) S64 2r Re (Tx — ôx, y) 
læ + ryll? 7 lx + ryl|? 


Depending on the sign of Re (Tx — x, y}, we can say that 


otTy op 2 eg 
lz + ryll lz — ry|| 


Corollary 5.5-11 follows by reduction to this case and Theorem 5.5-10. O 


Symmetric Numerical Range 


Let || || be a norm on C” and || ||* be the dual norm induced. Let S = {x € 
C”, |lz|| = 1} and S* = {y € C”, |ly||* = 1} denote the Cartesian product 
S x S* by a. 

The symmetric numerical range of a matrix A is defined as 


Z(A) = conv £ (y* Ax + x* Ay), (x,y) € n} . 


Z(A) satisfies all the usual properties (convex, spectral inclusion, Z(A + 
B) C Z(A) + Z(B), Z(AA) = AZ(A) of a numerical range (see the Notes 
at the end of this section). 


Notes and References for Section 5.5 


C-numerical ranges have been studied extensively. A recent review of the 
results and some open problems can be seen in 
C. K. Li (1994). “C-Numerical Ranges and C-Numerical Radii,” Lin. Mul- 
tilin. Alg. 37, 51-82. 
Another review article is that of 
M. Goldberg (1979). “On Certain Finite Dimensional Numerical Ranges 
and Numerical Radii,” Lin. Multilin. Alg. 7, 329-342. 

Inequalities involving the C-numerical radius can be found in 
M. Goldberg and E. G. Straus (1979). “Norm Properties of C-Numerical 
Radii,” Lin. Alg. Appl. 24, 113-131. 

The geometry of Wc(A) has been studied by various authors. A review 
can be found in 
N. Bebiano and J. da Providencia (1994). “Some Geometrical Properties of 
the Numerical Range of a Normal Matrix,” Lin. Multilin. Alg. 37, 83-92. 
The technique of pinching and its use in inclusion relations can be found 
in 
M. Goldberg and E. G. Straus (1977). “Elementary Inclusion Relations for 
Generalized Numerical Ranges,” Lin. Alg. Appl. 18, 11-24. 
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The convexity of W.(A) and Theorem 5.5-9 when c is real or a rotation 
of a real vector was shown by 
R. Westwick (1975). “A Theorem on Numerical Range,” Lin. Multilin. 
Alg. 2, 311-315. 
The simplified version we gave is due to 
Y.T. Poon (1980). “Another Proof of a Result of Westwick,” Lin. Multilin. 
Alg. 9, 35-37. 

Wc(A), although not convex for normal C, is star-shaped. This was 
shown in 
N. K. Tsing (1981). “On the Shape of the Generalized Numerical Range,” 
Lin. Multilin. Alg. 10, 173-182. 
An infinite-dimensional version of the above theorem appears in 
M. S. Jones (1991), “A Note on the Shape of the Generalized C-Numerical 
Range,” Linear and Multilinear Algebra 31, 81-84. 
An earlier version can be found in 
G. Hughes (1990). “A Note on the Shape of the Generalized Numerical 
Range,” Linear and Multilinear Algebra, 26, 43—47. 

It has been announced in 
N. K. Tsing and W. S. Cheung (1996). “Star-Shapedness of the Generalized 
Numerical Ranges,” in Abstracts 3rd Workshop on Numerical Ranges and 
Numerical Radii (T. Ando and K. Okubo, eds.), Sapporo, Japan. 
that Wc(A) is always star-shaped with center (tr A)(trC)/n. 

The M-numerical range and its applications can be seen in 
M. N. Spijker (1993). “Numerical Ranges and Stability Estimates,” Appl 
Num. Math. 13, 241-249. 
H. W. J. Lenferink and M. N. Spijker (1990). “A Generalization of the 
Numerical Range of a Matrix,” Lin. Alg. Appl. 140, 251-266. 

The 6-numerical range was first studied by 
J. Stampfli (1970). “The Norm of a Derivation,” Pacific Journal of Math. 
33, 737-747. 
The convexity of Ws was proved by 
J. Kyle (1977). “Ws is Convex,” Pacific Journal of Math. 72, 483-485. 

The generalization to the restricted numerical range was given in 
K. Das, S. Mazumdar and B. Sims (1987). “Restricted Numerical Range 
and Weak Convergence on the Boundary of the Numerical Range,” J. Math. 
Phys. Sci. 21, 35-41. 

The k-numerical range first mentioned by Halmos in [H] was studied and 
generalized in 
Y. Poon (1980). “The Generalized k-Numerical Range,” Linear and Mul- 
tilinear Algebra 9, 181-186. 
M. Marcus (1979). “Some Combinatorial Aspects of the Generalized Nu- 
merical Range,” Ann. New York Acad. Sci. 319, 368-376. 
Y. Au-Yeung and N. Tsing (1983). “A Conjecture of Marcus on the Gen- 
eralized Numerical Range,” Linear and Multilinear Algebra 14, 235-239. 
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W. Man (1987). “The Convexity of the Generalized Numerical Range,” 
Linear and Multilinear Algebra 20, 229-245. 

For details on the symmetric numerical range, see 
B. D. Saunders and H. Schneider (1976). “A Symmetric Numerical Range 
for Matrices,” Numer. Math. 26, 99-105. 


5.6 W(A) Computation 


As an experimental tool, it is convenient to have a code to produce W(A) 
graphics. There are essentially two ways to go about this: either from 
without or from within. 

Approximation from without uses the elementary idea that the boundary 
OW (A) may be traced by computing the maximum eigenvalue Aj** and the 
associated eigenvector 2¢ of the real part of et? A as 0 runs over a reasonably 
finite discretization of 0 < 0 < 27. The point (xg, Re (Ae’”) x9) /(x9, £o} will 
be a boundary point of W (A). The graph of W (A) can also be constructed 
using only the values \;"°*. In that case, we have 


(5.6-1) W(A)C (] [half-plane : e~{z: Rez S Ap**}]. 


OLOT 


The computation thus reduces to a subroutine for Amax- 

Approximation from within uses the fact (Section 1.1) that W (A) is the 
union of numerical ranges of two-dimensional submatrices. More precisely, 
if A is an n x n general complex matrix, then W (A) is the union of all 
W (Auv), where 


_ | (Au,u) (Av,u) 
Awm | Can (Av,v) |’ 


where u and v run over all pairs of real orthonormal vectors. The com- 
putation thus reduces to generating a reasonably complete set of random 
orthonormal pairs u,v and plotting the elliptical disks W (Auv) on top of 
each other, thus shading the interior of W(A). 

In the figures that follow, we preferred the outside-in approach, which 
takes efficient computational advantage of the bounded convexity of W (A). 
On the other hand, we would like to mention that the inside-out approach 
is theoretically more interesting because it constructs all two-dimensional 
real compressions of A. 

For finite matrices, the outside-in approach has some clear advantages of 
coding ease and efficiencies. On the other hand, we would like to note that 
if one wanted to extend these W(A) computations to unbounded operators 
A, the inside-out approach would still be valid, whereas the outside-in 
approach would have to be modified because of potential discrepancies 
between the domains D(A) and D(A*) when forming Re A. 
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Figures 1, 2, and 3 are the numerical ranges W(A) of the three examples 
1, 2, and 3 of Section 5.2, respectively. Fig. 4 is W(A) for the 0-1 example 
A of Section 5.3. Fig. 5 is W(T) for T = SÌ + S7 of Section 5.3; W (S) is 
exactly the same. Fig. 6 is W(TS) of that section. Because the numerical 
ranges of A and its adjoint A* are the same for real matrices, Fig. 5 and 6 
are also those for the T and TS of Section 2.5. 

Figure 7 is W(A) for a simple, nonsymmetric, banded 0-1 matrix which 
might come from a discretization of a first-order differential equation or 
from some Toeplitz matrix application. There the matrix A is 


010 0 0 
001 0 0 
A=/1 00 1 0 
0100 1 
0010 0 
Compare with Fig. 8, in which the band structure is missing, where A is 
001 0 0 
100 0 0 
A=|{0 0001 
010 0 0 
0001 0 


Notice w(A) = 1 for the latter, in accordance with Section 5.3. Figure 9 is 
the numerical range W(A) of 


4 0 0 -i 
-1 4 O0 0 
0 -1l1 4 0 
0 0 -1 4 


which might come from the discretization of a second-order partial differ- 
ential equation. 


A= 
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Figure 6. W(A) for A 
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The W(A) computation from outside-in is due to 
C. Johnson (1978). “Numerical Determination of the Field of Values of a 
General Complex Matrix,” SIAM J. Numer. Anal. 15, 595-602. 

The W(A) computation from inside-out is due to 
M. Marcus (1987). “Computer Generated Numerical Ranges and Some 
Resulting Theorems,” Linear and Multilin. Alg. 20, 121-157. 

Also see [M] for more information on W (A) and other matrix computations 
on personal computers. 

The code and graphics for the figures produced here were written by 
Dr. G. Sartoris, using a variant of the outside-in supporting-hyperplane 
approach. One can shade the outside with tangent lines if so desired. We 
chose to do so to better highlight W(A). 

W(A) will be z-axis symmetric for all real matrices A, as in the figures 
presented here. Minor modifications enable similar computation of W(A) 
for complex matrices A. 


Endnotes for Chapter 5 


The core of the numerical range theory lies in the Hilbert space setting. Ba- 
sic finite-dimensional theory is included in the Hilbert space theory, which, 
historically speaking, prompted generalizations to the Banach space situa- 
tions. 

Numerical range theory in finite dimensions, beyond its intrinsic interest 
as a part of matrix analysis, mainly has applications as a goal. As we have 
presented it, Section 5.1 gives applications in matrix theory, and Sections 
5.2 and 5.3 are devoted to methods for finding bounds on the spectrum and 
on the numerical range. The theme of Section 5.4, the Hadamard product, 
originated in applications to differential equations. Many of the general- 
izations of numerical ranges in Section 5.5 were motivated by applications, 
and we presume and hope that some more generalizations will appear in 
the future with specific applications in view. 

We may synthesize a tentative suggestion: an important emphasis in nu- 
merical range theory for finite dimensions in the immediate future should 
be to connect more specifically with the enormous number of new and 
important matrix classes now coming out of applications and matrix com- 
putations. For example, each basic matrix iterative method (e.g., Jacobi, 
Gauss-Seidel, successive over relaxation), to say nothing of all the impor- 
tant recent variations on the conjugate gradient method (such as GMRES, 
Orthomin, Lanzcos methods), all implicit or semi-implicit methods (such as 
ADI, etc.,...) coming out of discretizations of two- and three-dimensional 
physical problems in computational fluid dynamics, computational physics, 
computational chemistry, computational engineering in general, all classes 
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of sparse and other matrices coming out of econometrics and statistics, 
should become candidates for connection to the theory of the numerical 
range. This would increase the richness of the subject of numerical range 
and aid understanding of the properties of each specific matrix class now 
being used extensively in applications and computations. 

For example, to illustrate this synthesis, we chose the matrix A of Fig. 

7 to be representative of the type studied by 
L. N. Trefethen (1990). “Approximation Theory and Numerical Linear 
Algebra,” in Algorithms for Approximation. II, eds. J. Mason and M. Cox, 
Chapman, London, 336-360. 
in the investigation of pseudo-eigenvalues (see Section 4.6 here) and of the 
type studied by 
M. Eiermann (1993). “Fields of Values and Iterative Methods,” Linear 
Alg. & Applic. 180, 167-197. 
in the investigation of matrix classes arising from linear solver iterative 
schemes (see Fig. 2b there). Figs. 8 and 9 are numerical ranges for 
matrices A we took from 
D. M. Young (1971). Iterative Solution of large Linear Systems, Academic 
Press, New York. 
Specifically, Fig. 8 is W(A) for A on p. 426, a matrix that is irreducible 
CO(2,3) but is not a CO*(2,3) matrix, in the terminology of property- 
A matrices discussed there. The W(A) of Fig. 9 is that of matrix A of 
Exercise 11, p. 431, a matrix that may be permuted so that P~!AP is a 
CO(2,2) matrix. 

We may mention another example, the 113 x 113 sparseness matrix A of 
K. Gustafson and R. Hartman (1985). “Graph Theory and Fluid Dynam- 
ics,” SIAM J. Alg. Disc. Math. 6, 643-656, 
see also 
K. Gustafson (1987). Partial Differential Equations, 2nd Ed., Wiley, New 
York, 311. 
which would certainly have an interesting numerical range. That matrix 
represents the support, found by graph theoretic methods, of a basis for 
discrete solenoidal vectors for a finite-element scheme for the weak solution 
of a rather simple partial differential equation coming from linearized fluid 
dynamics. 

These examples carry no special significance in themselves and were 
quickly chosen just to illustrate the synthesis that we are advancing in this 
endnote. During the twenty-five years since Young’s book (1971), many 
more interesting classes of matrices have been investigated for iterative so- 
lutions. Similarly, for partial differential equations, finite-element methods 
have augmented finite-difference methods in applications, yielding many 
interesting classes of large sparse matrices. Depending on the grid gen- 
eration method, some of these have good pattern structure, and some do 
not. 


6 


Operator Classes 


Introduction 


The special properties of the numerical range of normal operators led to the 
creation of new operator classes. In particular, three classes of operators 
attracted a lot of attention. Each of these inherits one or more of the 
numerical range properties of the normal operator. 

An operator is normaloid if its norm and numerical radius are equal, 
w(T) = ||T||. Recall by Theorem 1.3-2 that this condition implies r(T) = 
ITI: 

An operator is convezoid if the closure of its numerical range is the convex 
hull of its spectrum, W (T) = Coo(T). The notations X(T) = coo(T) are 
also commonly used. 

An operator is spectraloid (also sometimes called spectral) if its numerical 
and spectral radii are equal, r(T) = w(T). Note that this means that 
o(T) NOW (T) £ @. 

In this chapter, we study some of the relations between these classes and 
see what additional conditions, especially those related to the numerical 
range, make them normal. There are a number of related operator classes 
which we do not treat. 


6.1 Resolvent Growth 


There are considerable differences between the three classes of operators 
mentioned in the Introduction, in spite of the fact that they share some 
properties of the normal operator. However, additional structure often per- 
mits-us to establish the equality of any two of these classes. In this section, 
we provide some technical lemmas concerned with resolvent growth, an un- 
derlying theme in this chapter, which enable us to establish such equalities. 
This is done here to avoid breaking the cycle of arguments presented later 
on. 
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Lemma 6.1-1 (Bare points). A nonempty, compact convex set in C is the 
convex hull of its bare points. 


A complex number is a bare point of a set S in C if there exists a 
circle passing through A containing S. This lemma replaces the extreme 
points in the Krein—Milman theorem with the bare points; see Notes and 
References. 


Proof. Let S be the compact convex set and A the set of its bare points. 
If S consists of one point, this is in A. If S consists of at least two points, 
its diameter is assumed at two distinct points œ and 8. It is clear that a 
and £ are in A by considering circles centered at one point and passing 
through the other. Thus A is nonempty. Let D be the closed, convex hull 
of A. Evidently, D C S. Suppose that D # S. Then, there is a support 
line L of D so that D lies on one side of L and the point s € S — D on the 
other side. 

Without loss of generality, let us assume that L is the imaginary axis 
and D is contained in the left half-plane. Now suppose that the set E = 
SN {(xz,y) : x < 0} is contained in the region 


{(z,y):-b<2<0,-M<y<M}. 


This is always possible since S is bounded. The circle with center at (—c, 0) 
and passing through (5,0) contains E if 


ac > Y M? +b. 


Thus, the set of circles containing E and having a nonempty intersection 
with S — D is nonempty. Hence, there exists a circle with center at (—c, 0) 
and having the smallest radius, which touches S at the point d, with Red > 
0. Then d is a bare point not in D, a contradiction. O 


Lemma 6.1-2. The conditions 


(6.1-1) I(T — AD) |\dQ, o(T)) < 1 
and 
(6.1-2) d(A,o(T))||z|| < ||T2—AIz|| forall ce H 


are equivalent for any À ¢ o(T). 


Proof. 
I(T -= ALT) |\dQ, o(T)) < 1 = 
I(T -AD tylld(à, o(L)) < |lyl| text forall y € H <=> 
Iz||d(A, o(T)) < ||Tx — AIz|| choosing y = Tz — àz. O 


Lemma 6.1-3. T and A in the previous lemma can be replaced by aT + GI 
and aà + BZ, where a, ß € C. 
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Proof. Notice that 
(aT + BI) — (aA + BI) =a(T — Al), 
læ(T — AD] = fal [TZ — AI, 
and o(aT + BI) =ao0(T)+f. O 


Lemma 6.1-4. The following inequality holds for any  ¢ o(T): 
(6.1-3) d(A,o(T))||(T — AL)~*|| 2 1. 


Proof. Since the norm of any operator is not less than the spectral radius, 
we have 


(6.1-4) |\(T—AI)~"|| > r((T — AD“) = sup{ial,a € o(T — AI)“"}. 


Since (T — AJ) is invertible, we have 0 ¢ o((T — AI)—'). Taking a = 5 and 
using the spectral mapping theorem, we can thus write 


\(T — d1)""|| > sup | E peo(T- AD} 


1 

~ inf{lel, u € o(T — AD} 
1 

= Ia 


Next, we give a condition for the positivity of T, actually, its dissipative- 
ness. 


Lemma 6.1-5. The conditions 


(6.1-5) A(T -AID }|| <1 for all positive Aà ¢ o(T) 
and 
(6.1-6) Re(Tz,x) <0 forall xe H 


are equivalent. 


Proof. Let A||(T—AJ)~!|| < 1. For any y € H, we have Al|(T —AI)~*y]| < 
lyll. Choosing y = Tx — AIz, we get Allz|| < ||Tx — Ala]. Taking |||] = 1, 
we have 


X? < ||Tx||? — 2\ Re (Tz, x) + X°. 


Thus 2A Re (Tz, z) < ||Tz||?. 
Since A is arbitrary, we must have 


Re (Tx, x) < 0. 


The converse can be proven by simply reversing the steps. U 
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A compact convex set can be obtained by intersecting either the planes 
containing it or all the circles containing it. This distinction is reflected in 
considering it as the convex hull of its extreme points or bare points. This 
idea is used in the improvement of the Krein—Milman theorem, Lemma 
6.1-1, by 

G. Orland (1964). “On a Class of Operators,” Proc. Amer. Math. Soc. 
15, 75-79 

where the connection between the resolvent norm and the negativity of the 
operator, Lemma 6.1-5, is also established. 

Some more results involving bare points in convexoidity can be found in 
V. Istratescu and I. Istratescu (1970). “On Bare and Semibare Points for 
Same Classes of Operators,” Portugaliae Mathematica 29, Fax 4, 205-211. 

Growth conditions on the resolvent were studied by, among others, 

G. R. Luecke (1971). “A Class of Operators on Hilbert Space,” Pacific J. 
Math. 41, 153-156. 

Many relations between resolvent growth conditions and particular opera- 
tor classes were given by 

T. Furuta (1977). “Relations between Generalized Growth Conditions and 
Several Classes of Convexoid Operators,” Canadian J. Math. 29, 1010- 
1030. 

Lemmas 6.11-2 and 6.1-4 are essentially the general relation (1.4-6), 
which we stated earlier. For A ¢ W(T), this general relationship becomes 
(4.6-7), which we also stated earlier. 


6.2 Three Classes 


The normaloid operators have a simple characterization in terms of the 
norms of powers of T. 


Theorem 6.2-1. An operator T is normaloid iff 


(6.2-1) IT” = ||TI" for n=1,2,3,... . 


Proof. Let T be normaloid. Then ||T”|| = [r(T)|" = r(7™) by the spectral 
mapping theorem and hence 


IZ" = fr)” < T> 
Since we always have 
ITI” > IT” 


we conclude that ||T”|| = ||T||”. 
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On the other hand, if 
7" || = |T|” for n=1,2,3,..., 
we have 
r(T) = lim |T" = lim |T| = |T]. o 
n—00 n—0o 
The following cycle of theorems relates the normaloid and convexoid 


operators. 


Theorem 6.2-2. An operator T is convexoid if T — XI is normaloid for 
all A EC. 


Proof. Let T — AI be normaloid for all A € C. In view of Lemma 6.1-1, 
it is enough to prove that every bare point u € W(T) is in o(T). There 
exists a circle through p and containing W (T). Let À be the center of this 
circle. Then 


|u — A| = sup{d(,, z): z € W(T)} 
= sup{d(0,z — å), z E W(T)} 
= w(T — AI) = ||T — AI|| 


since T — AI is normaloid. Since y — A € W(T — A) and |u — A| = |T —AI|], 
we have u — à € o(T — Al). Hence p € o(T). O 


Theorem 6.2-3. IfT is convexoid, then 
I(T -— AD * dQ, 2(7)) <1 forall A ¢ X(T). 


Proof. Since X(T) is a compact convex set, given A € C, there exists a 
point u on X(T) so that d(A, u) = d(A, &(T)). In view of Lemma 6.1-3, we 
can choose p as the origin and the line joining À and p as the z-axis. Since 
T is convexoid, we may assure that W(T) also lies in the left half plane 
and hence ReT < 0. Then using Lemma 6.1-5, we have 


_ 1 1 
I(T — AI) | < ` = d(A, ETY) O 


Theorem 6.2-4. If an operator T satisfies the inequality 
I(T — AT)" dQ, E(T)) < 1 
for all A ¢ X(T), then T is convexoid. 


Proof. Without loss of generality, in view of Lemma 6.1-3, we may assume 
that X(T) has the imaginary axis as a line of support and lies in the left 
half-plane. Let A be any point on the real axis with A > 0. We then have 


_ 1 
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implying, as in Lemma 6.1-5, that ReT < 0. Hence W(T) is also in the 
left half-plane. O 
Summarizing Theorems 6.2-2, 6.2-3, and 6.2-4, we have 


T—AI isnormaloid forall à € C => 
T isconvexoid <> ||(T —AI)~*|\d(A,X(T)) < 1 


for all A ¢ X(T). 

The next two theorems relate and describe convexoid and spectraloid 
operators. It is obvious from the definitions that convexoid operators are 
spectraloid. In the spirit of Theorem 6.2-2, we can see that if the translates 
of T are spectraloid, then T is convexoid. 


(6.2-2) 


Theorem 6.2-5. An operator T is convexoid iff T — AI is spectraloid for 
all AEC. 


Proof. If T is convexoid, so is T — AI and hence spectraloid. To prove 
the converse, let us observe that any compact convex set X in C can be 
written as the intersection of all the circles containing it. Thus 


WE) = (4a da < sup salt 
(6.2-3) a zEW(T) 


=( HA: JA- a| < w(T — al)}. 


Similarly, 


X(T) = (MA: Jà- a| < r(T — all)}. 
Since T — aI is spectraloid, we observe that the above sets are identical. O 


Theorem 6.2-6 (Power equality). An operator T is spectraloid iff 
(6.2-4) w(T*)=w*(T), for k=1,2,3,.... 


Proof. Assuming that 
w(T*) = w*(T), 
we have 
w*(T) < ITF]. 
Hence w(T) < ||T*||!/*. Taking the limit as k > oo, we have 
w(T) < r(T). 


The reverse inequality always holds. 
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To prove the converse, let A € o(T) such that |A| = r(T) = w(T). By 
the spectral mapping theorem, we have 


A” €0(T") c W(T”), 
and hence 

AI" = w"(T) < w(T”). 
The reverse inequality w(T”) < w”(T) is the power inequality theorem 
(Theorem 2.1-1). O 


Example 1. A well-known class of normaloid operators is that of para- 
normal operators. An operator is paranormal if 


(6.2-5) [Tz]? < ||T%x||||z|| foral zeH. 


Let us show that paranormal operators are always normaloid. Taking 
\|z|| = 1, we have 
|Tel? < ||T%al] < |T’ for all |x|] =1. 


Hence ||T'||? < ||T?||. Since we always have ||T?|| < ||T'||?, we conclude that 
ITI? = ||T?||. An induction argument shows that ||T7”|| = ||T'||". We can 
now use Theorem 6.2-1 to conclude that T is normaloid. 

Paranormal operators are not always convexoid (see Notes and Refer- 
ences). However, the hyponormal operators, a subclass of the paranormal 
operators, are convexoid. An operator T is hyponormal if 


(6.2-6) |T*ax|] < ||Tz|| for all xe H. 
We see immediately that 
|Tel? = (T*Tz, £) < ||T*Tz||llz|| < |T72l\lzI, 


and hence the hyponormal operators are also paranormal and thus nor- 
maloid. It is easy to check that all translates T — AI of a hyponormal 
operator T are also hyponormal. Hence T is convexoid using Theorem 
6.2-2. 


Example 2. A normaloid operator need not be convexoid. Let H = C3’ 
with the Euclidean norm, given by 


Fl? = (fi, fa, IOM? = fal? + Ifl? + Ifl. 
Let 


0 1 0 
T={]0 0 0 
0 0 1 


Then Tf = (f2,0, f3) and ||T||? = supy sy {lfel? + |fsl} = 1. On the 
other hand, 


(Tf, f) = fofi + fafa, 
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and consequently, 


w(T) = sup fa + fsfs}=1 


by taking f3 = (0,0,1). Hence T is normaloid. 
It can be verified easily that 


o(T) = {(1,0)} 


and W(T) is the smallest convex set containing (1,0) and the set 
1 
A, |Al << =>. 
fa<) 


Example 3. A convexoid operator need not be normaloid. Let {z1, £2, 
... } be an orthonormal base for H = £2. Define zn = £ən+1, n = 0,1,2, 
..., aNd Z-n = Tən, N = 1,2,3,.... Every x € H can be written as 


OO 
T = ) AkZk- 
— OO 


Let us now define the operator S on H by 


Hence T is not convexoid. 


1 OO 
Sz = z X Qk2k+1, 
— o0 
where z = )). @kZk. We can check easily that 


1 
W(S) = fa € C, |A| < H 
Let us define the operator 


_ 10 1 2 
r= 6 i on C^. 


The operator T defined on H R€? by 
T(f,9) = (Lf, S9) 
yields 
W(T) = hc, Al < 5} = coo(T). 
T is not normaloid since 


\T|| =1 and w(T) = 5. 
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Example 4. A spectraloid operator need not be convexoid. In H = C3, 
consider the operator 


1 0 
L=]0 0 
0 1 


O O O 


We have, as in Example 2, that o(T) = {0,1} and W (T) = co {o (T), S}, 
where 


s= {requis st 


Example 5. A slight modification of the above example produces a 
spectraloid operator that is not normaloid. In H = C3, let the operator 


1 0 0 
T=|0 0 0 
0 2 0 
We have ||T| = 2 and w(T) = r(T) = 1. 


Notes and References for Section 6.2 


Normaloid operators were first studied by 
A. Wintner (1929). “Zur theorie beschränkten Bilinearformem,” Math. Z., 
Vol. 30, 228-282. 

The nomenclature “convexoid” and “spectraloid” is due to Halmos [H], 
where several examples of proper inclusion between these classes are pro- 
vided. 

An exhaustive characterization of convexoid operators can be found in 
T. Furuta (1973). “Some Characterizations of Convexoid Operators,” Rev. 
Roum. Math. Pures et Appl. 18, 893-900. 

The same author raised the interesting question of whether paranormal 
operators are always convexoid, in 

T. Furuta (1967). “On the Class of Paranormal Operators,” Proc. Japan 
Acad. 43, 594-598. 

Counterexamples were given independently in 

T. Ando (unpublished, 1974). 

K. Gustafson and D. Rao (unpublished, 1985). 

For the latter, see 

D. K. Rao (1987). “Operadores Paranormales,” Revista Colombiana de 
Matematicas 21, 135-149. 

The original proof for the convexoidity of hyponormal operators was 
given by 
J. G. Stampfli (1965). “Hyponormal Operators and Spectral Density,” 
Trans. Amer. Math. Soc. 117, 469-476. 
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The condition ||(T — AI)~+|| < (d(A, X(T)))~? (without the use of the 
name convexoid) gave rise to a corresponding class of operators in the 
early 1960s. Some of the results proved in Orland’s paper (see References 
for Section 6.1) also appear in | 
G. Lumer (1961). “Semi-inner-product Spaces,” Trans. Amer. Math. Soc. 
100, 24-43. 

G. Lumer and R. S. Phillips (1961). “Dissipative Operators in a Banach 
Space,” Pacific J. Math. 11, 679-698. 
in the context of operators on Banach spaces rather than Hilbert spaces. 

In Section 1.4, we commented on the fact that normal operators satisfied 
the exact estimate (1.4-4), namely, that ||(T — AZ)~+||~! = d(A, o(T)) for 
all A ¢ o(T). This condition was one of the early jumping-off points in the 
development of the operator classes discussed in this section. The class of 
all operators T satisfying this condition was called the class G,. It turns 
out that this class falls properly between the hyponormal operators and 
the convexoid operators. 

Another class of operators, known as centroid operators, was introduced 
in 
S. Prasanna (1981). “The Norm of a Derivation and the Bjorck-Thomee— 
Istratescu Theorem,” Math Japonica 26, 585-588. 

Let Do and Dw be the smallest disks containing o(T) and W(T), respec- 
tively, with radii Rr and Wr and centers at z, and w,. Further, let 
Br = supy,)—1 {IT zl? — |(Tz,2)|?}. Then Mr = Br is the distance of 
T to the scalars and is called the transcendental radius of T. The operator 
T is called centroid if T — Zr is normaloid. Clearly, centroid operators are 
a class of operators beyond spectraloid. For further information, see 

T. Furuta, S. Izumino and S. Prasanna (1982). “A Characterization of 
Centroid Operators,” Math. Japonica 27, 105—106. 
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A set S is called spectral for an operator T if it contains o(T') and 


(6.3-1) [F(T < suptlf(z)|, 2 € S} 


for all rational functions f with no poles in S. 

Notice that if S is a spectral set for T, any set containing S will also be 
spectral. | 
Theorem 6.3-1. Let W(T) be spectral for T. Then T — XI is normaloid 
for all X € C. 


Proof. Consider 
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Then, the spectrality of W (T) implies 


IT — AI] < sup{|z— A], ze W(T)} 
= d(A,W(T)) = w(T — Al). 
Hence T — AI is normaloid. O 


(6.3-2) 


Corollary 6.3-2. If W(T) is a spectral set for T, then T is convexoid. 
Proof. This follows immediately from Theorem 6.2-2. O 


Corollary 6.3-3. Ifcoo(T) is a spectral set forT, then T—XI is normaloid 
for all A€ C and T is convexoid. 


Proof. If coo(T) is spectral for T, so is W(T). O 

We now characterize the operators T for which W(T) is a spectral set. 
We borrow a theorem from dilation theory (see Notes and References) 
which says that for any compact convex set S that is spectral for T, there 
exists a normal operator N defined on a larger Hilbert space K > H such 
that o(N) C OS and T”x = PN"z, x € H,n=0,1,2,.... Such a dilation 
N is called a strong normal dilation of T (see Section 2.6). 


Theorem 6.3-4. If coo(T) is spectral for T, then there exists a strong 
normal dilation N of T such that 


(6.3-3) coo(T) = W(T) = W(N). 


Proof. Let N be the dilation such that o(N) C Ocoo(T) and T”z = 
PN”rz for xz € H,n=0,1,2,.... Then W(T) = {(Nzx, zx) | z € H, ||z|| 
1} c W(N). Since N is normal, W(N) = coo(N). Hence W(T) 
coo(N) = W(N). Also, coo(N) C CodCoo(T) = Coo(T) c W(T). 
coo(T)=W(T)=W(N). O 

From Theorem 6.3-4, one immediately obtains the following corollary. 


aa 


Corollary 6.3-5. If W(T) is a spectral set for T, then there is a strong 
normal dilation N of T such that W(T) = W(N). 


Proof. By Corollary 6.3-2, T is convexoid. U 


Theorem 6.3-6. If there is a strong normal dilation N of T such that 


W(T) = W(N), then W(T) is a spectral set for T. 


Proof. For any 4, |A| > ||N|| = w(N) = w(T), we have series expansions 
for (T — AI)~! and (N — AI)~!, where T” = PN”. So ((T — AI)~'z,y) = 
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((N — XI)~'2,y) for x,y € H. Since this equality holds for all A ¢ W (T), 
we have for any rational function f that has no poles in W(T), 


(f(T)z,y) =(f(N)z,y), 2,y € A. 
Hence 


IFT] < FCN) = supt{|f(z)|,2z € o(N)} 
< sup{|f(z)|,z € W(N) = W(T)}. 


Notes and References for Section 6.3 


Spectral sets were first introduced by Von Neumann in an attempt to ex- 
tend spectral theory to nonnormal operators. For polynomials p(T) of a 
contraction (||T|| < 1) and S the closed unit disk, the spectral set condition 
(6.3-1) is satisfied. See 

J. von Neumann (1951). “Eine Spektraltheorie fiir Allgemeine Operatoren 
eines Unitaren Raumes,” Math. Nachr. 4, 258-281. 

See [RN] for the classical theory of spectral sets. Their relation to nu- 
merical range was studied by 
M. Schreiber (1963). “Numerical Range and Spectral Sets,” Michigan 
Math. J. 10, 283-288. 

S. Hildebrandt (1964). “The Closure of the Numerical Range as a Spectral 
Set,” Comm. Pure Appl. Math., 415-421. 

See also 

S. Hildebrandt (1966). “Uber den Numerischen Wertebereich eines Oper- 
ators,” Math. Annalen 163, 230-247. 

K. Gustafson (1972). “Necessary and Sufficient Conditions for Weyl’s The- 
orem,” Michigan Math. J. 19, 71-81. 

The theorem we used for dilation is due to 
A. Lebow (1963). “On Von Neumann’s Theory of Spectral Sets,” J. Math. 
Anal. Appl. 7, 64-90. 

The simple proof of Theorem 6.3-1 is due to 
T. Saito and T. Yoshino (1965). “On a Conjecture of Berberian,” Tohoku 
Math. J. 17, 147-149. 

If a triangle contains W(T), then the circumcircle of this triangle is a 
spectral set for T because at least one of the vertices is at a distance from 
the origin greater than or equal to ||T||. This result is given in 
B. A. Mirman (1968). “Numerical Range and Norm of a Linear Operator,” 
Trudy Seminara po Funkcional Analizu 10, 51-55. 

From this it follows that if the numerical range W(T) is a triangle, then 
for all complex numbers a and 8, the operator aT + GI is normaloid. 
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6.4 Normality Conditions 


The three classes of operators described in Section 6.2 are far from being 
normal, although they share some properties with normal operators. Now 
we will present some results relating these classes to normality. 


Theorem 6.4-1. The following conditions on an operator T are equiva- 
lent: 


(a) T is normal. 


(6.4-1) (b) Every quadratic polynomial in T and T* is normaloid. 
(c) Every quadratic polynomial in T and T* is convexoid. 


(d) Every quadratic polynomial in T and T* is spectraloid. 


Proof. (a) => (b). If T is normal, so is every quadratic polynomial 
p(T,T*) in T and T*. Consequently, p(T,T*) is normaloid. 

(b) => (c). The quadratic polynomials p(T,7*) + AI, A € C are all 
normaloid, and hence p(T, T™*) is convexoid by Theorem 6.2-2. 

(c) = > (d). The family of quadratic polynomials p(T,T™*) is also spec- 
traloid. 

(d) => (c). The family of quadratic polynomials p(T,T*) + AZ, A €C 
are all spectraloid and hence p(T, 7T™*) is convexoid, using Theorem 6.2-5. 

(c) => (a). Let A and B be the selfadjoint operators given by 


1 
A = 3 (T +T*) = ReT, 
1 * 
B= > (T —T*) = ImT. 
Suppose that the quadratic polynomial AB, 
1 
AB = J (T —T*)(T+T"), 


is convexoid. Without loss of generality, we may assume that A and B are 
strictly positive. Then we have, using Theorem 2.4-1, 


W(B) 
W(A-1)' 


Hence o(AB) is real. Since AB is convexoid, we see that W (AB) is real. 
Hence AB is selfadjoint, and this implies the normality of T. O 

Other conditions equivalent to the normality of an operator T in a finite- 
dimensional Hilbert space depend on the notion of reduction properties. 
For any property P of operators, we say that an operator T is reduction-P 
if the restriction of T to any reducing (invariant under T and T*) subspace 
has property P. Similarly, an operator T is restriction-P if its restriction 


(6.4-2) o(AB)C 
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to every invariant subspace has property P. Further, let us denote an 
operator T as transloid if aT + GI is normaloid for a, ĝ € C. 


Theorem 6.4-2. The following conditions on an operator T in a finite- 
dimensional Hilbert space are equivalent: 


(a) T is normal. 

(b) T is reduction transloid. 
(6.4-3) (c) T is reduction normaloid. 

(d) T is reduction convexoid. 


(e) T is reduction spectraloid. 


Proof. The proof depends on the concept of normal eigenvalue introduced 
in Theorem 5.1-9, namely, an eigenvalue on the boundary of the numerical 
range. Recall that if A is a normal eigenvalue, then the dimension of the 
eigenspace M for À is equal to its algebraic multiplicity and A is unitarily 
equivalent to AIm @ B, where à ¢ o( B). 

If T is normal, so is its restriction to any reducing subspace, and so 
condition (a) implies (b), (c), and (d). Also, (e) and (b) = (c) by definition; 
(b) = (d) by Theorem 6.2-2; (d) = (e) and (c) => (e) by definition. So the 
only nontrivial implication remaining to be shown is (e) => (a). 

Let T be spectraloid on a reducing subspace M. Choose X € o(T/M) 
with |A| = w(T/M). Then A is a normal eigenvalue for T, and using 
Theorem 5.1-9 we can write T = AI, ® B, where Iyn is the eigenspace for 
A. We can now deal with B in a finite number of similar steps. O 


Notes and References for Section 6.4 


Conditions for normality of the classes of operators considered were given 
by 
S. K. Berberian (1970). “Some Conditions on an Operator Implying Nor- 
mality,” Math. Ann. 184, 188-192. 

Earlier, using arbitrary polynomials instead of quadratic polynomials, a 
condition of normality was obtained by 
R. G. Douglas and P. Rosenthal (1968). “A Necessary and Sufficient Con- 
dition for Normality,” J. Math. Anal. Appl. 22, 10-11. 

Earlier results can be found in 
J. G. Stampfli (1962). “Hyponormal Operators,” Pacific J. Math. 12, 
1453-1458. 

Further conditions on convexoid operators to attain normality are given 
in 
C. Meng (1963). “On the Numerical Range of an Operator,” Proc. Amer. 
Math. Soc. 14, 167-171. 
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Additional conditions equivalent to normality using the restriction-P 
concept can be seen in the paper by Berberian (1970). A restriction- 
normaloid operator T is normal if it is compact, as shown by 
V. Istratescu and I. Istratescu (1967). “On Normaloid Operators,” Math. 
Zeitschr. 105, 153-156. 

Conditions equivalent to normality of arbitrary matrices were given by 
R. Grone, C. Johnson, E. Sa and H. Wolkowicz (1987). “Normal Matrices,” 
Lin. Alg. and Appl. 87, 213-225. 


6.5 Finite Inclusions 


Additional relations between the three basic classes of normal-like operators 
that we have discussed can be obtained for certain finite dimensions. We 
will study such relations principally for dimensions n = 2, 3, 4. 

Following the notation of common use in this literature, let us denote, 
for general n, 
Nn = normal operators € Mhn, 
Rn = normaloid operators € Mn, 
(6.5-1) n P n 

Qn = convexoid operators € Mn, 


Sn = spectraloid operators € Mhn. 
We always have Sn C Rn, Nn C Rn N QnA Sn, Qn C Sn. 


Theorem 6.5-1. Nə = Rə = So. 
Proof. It is enough to show S2 C N2. By Schur’s lemma, we may assume 
that 
_ Aj a 
t= | 0 a ? 
where à, à2 € o(T) and |\;| = r(T) = w(T), the latter since T is spec- 
traloid. But if a 40, A, is a focus in the interior of an ellipse, a contradic- 
tion. If a = 0, T is diagonalizable. L 
The argument used above can be used to characterize spectraloid oper- 
ators in Mn. 
Theorem 6.5-2. T C Sn iff T is unitarily similar to a matrix of the form 
A 0 
r=[0 Bl 


where A is diagonal and B is triangular, with w( B) < r(T). 
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Proof. We can assume that T is triangular and its eigenvalues Aj, A2,... 
are ordered as 


Ar] = [Ae] = +++ = [As] 2 [Asti 2 -°° 2 |An|, $2 1. 


Here s is the number of eigenvalues whose magnitude is equal to the numer- 
ical radius of T. Let us choose 7,7 such that 1<i<sands+1<j<n. 
Consider the 2 x 2 submatrix of T 


|A tij 
s= f | | 
where |A;| > |A;|. By the submatrix inclusion W(S) c W(T), w(S) < 
w(T) = |A;|. As in Theorem 6.5-1, we can show w(S) > |A;| unless t;; = 0, 


r(T) = w(T) = max{w(A), w(B)} = |A|. The proof in the other direction 
is straightforward. U 


Corollary 6.5-3. Nə = Q2 = R = So. 
Proof. Observe that Q2 C So. O 
Theorem 6.5-4. N3 = Q3. 


Proof. If the three eigenvalues 41, A2, A3 of T € Qs are collinear, then T 
is normal since a rotation of a translate of T is selfadjoint. 

Let us then consider the general case when W (T) is the triangle A whose 
vertices are A,,A2,A3. T can be assumed to be 


Ai ti2 13 
T= {0 à2 tas 
0 0 A3 


Consider the submatrix 


o |à tig 
s- [h t], 


W(S) is an ellipse with foci at à1, åg and minor axis |T,3|. This ellipse 
is not contained in A unless tı3 = 0. Since T € Q3, t13 must be zero. 
Similarly, tı2 = t23 = 0. Hence T is diagonalizable. O 


Corollary 6.5-5. N, = Q4. 


Proof. If A1,A2,A3,A4 form a quadrilateral, we can use the argument of 
Theorem 6.5-4. There can be at most one point that is not a vertex of 
the triangle formed by the three other eigenvalues, and the same argument 
carries through. OU 


Corollary 6.5-6. Q4 = R4. 
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Proof. Q, = N4. O 
Summarizing the above results, we have the following situation. 


(6.5-2) 2 = Q2 = Ro = S2, n=2 
Nn = Qn C RanC Sn, n=3,4. 


Some of the proper inclusions can be seen in the examples of Section 6.2. 


Notes and References for Section 6.5 


Basic theorems useful in this context were given by 
B. N. Moyls nd M. D. Marcus (1955). “Field Convexity of a Square Ma- 
trix,” Proc. Amer. Math. Soc. 6, 981-983. 

An extensive study of these relations can be found in 
M. Goldberg and G. Zwas (1974). “On Matrices Having Equal Spectral 
Radius and Spectral Norm,” Lin. Alg. Appl. 8, 427—434. 
M. Goldberg, E. Tadmore and G. Zwas (1975). “The Numerical Radius 
and Spectral Matrices,” Lin. Multilin. Alg. 2, 317-326. 
M. Goldberg and G. Zwas (1976). “Inclusion Relations Between Certain 
Sets of Matrices,” Lin. Multilin. Alg. 4, 55-60. 


6.6 Beyond Spectraloid 


From the viewpoint of operator theory, classes of operators withnormal-like 
properties, such as the normaloid, convexoid, and spectraloid operators, are 
interesting structurally within the class of all normal-like operators. For 
example, the subnormal operators enlarged the class of operators possessing 
nontrivial, closed invariant subspaces (see [H]). However, we would like 
to advance the perspective in this final section that a substantial part of 
the future of numerical range research must reach beyond the normal-like 
operators. 

For example, consider the operators T and S used in the counter-example 
to the double-commute conjecture that was discussed in Section 2.5 and was 
also mentioned in Section 5.3. These are operators T and S with known 
positive w(T) and w(S) and for which w(T S) is greater than w(T)|| S|]. 
On the other hand, the spectral radii r(S) = r(T) = r(TS) are all zero. 
In other words, these rather simple finite composite shift operators, all 
of which would be spectraloid in their infinite-dimensional versions, are 
already well beyond normal-like in finite dimensions. 

Another example of this type is 


S 0 0 0 1 0 
(6.6-1) S=|0 S& 0|, S3=|001 
0 0 Ss 00 0 
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Forming 
0 Is S3 
T=|0 0 |, Ig = Identity, 
0 0 0 


we have ST = TS, ||S|| = 1. For any z = (z1,..., £9), 
Tx = (£4 + £8, £5 + T9, £6, £7, Lg, T9, 0,0,0), 
and hence 
(6.6-2) (Tz, £) = 14%, + £81 + £52 + T92 + £63 + £784 + Tgs5 + XoXo. 


As w(T) is the supremum over all expressions in (6.6-2) for X` (z;)? = 1 
and z; > 0, we may permute T by mapping as follows: 


(£1,..., £9) — (Y7, Y4, Y1, Y8, Y5, Y2, Y9s Yo, Y3). 
Then 


8 
(6.6-3) w(T) = sup J yiyin = cos (=), 
Syf=l i=] 10 
so that T, composed from Sg and Sz submatrices, has the same numerical 
radius as Sg. One can show that ST has a cycle involving vertices 1, 5, 
and 9, so w(ST) = 1. 

Finite shifts are essential in many parts of electrical engineering and in 
finite quantum field models. As n — ov, there is the significant spectral 
radius discontinuity from r(S,) = 0 to r(S,.) = 1. Moreover, perturbed 
finite shifts are commonly used (as we did in Chapter 4, in connection with 
pseudo-eigenvalue theory) to demonstrate high instability of spectra under 
tiny perturbations. Yet the good stability properties of the numerical range 
W(A) remain. As n — oo, the numerical range W(S,,) is a disk of radius 
cos 7/(n + 1) which smoothly approaches the unit disk numerical range 
of the infinite-dimensional shift operator. In other words, not only does 
the numerical range W(A) possess good stability properties under additive 
perturbation, for certain operator classes it may also have good stability 
properties during dimension change. 

We have just spoken in terms of Chapters 2, 4, and 5 about going be- 
yond spectraloid. Recall also that the discretizations of most gas dynamics 
equations, such as those treated in Chapter 4, are not normal, and as 
shown there, are not hyponormal, and are not generally expected to be 
spectraloid. Indeed, if they were spectraloid, the hydrodynamic stability 
problems would all be solvable in terms of convex hull of the spectra of the 
operators. Thus, the numerical range emerges as an alternate tool, as the 
role of the spectra no longer suffices. 

In Chapter 3, we noted that we need to better understand antieigenvalues 
for operators beyond normal. With that theory’s fundamental geometri- 
cal content and relationship to the numerical range, we may expect that 
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more specifics for antieigenvalues and antieigenvectors for operators beyond 
spectraloid will eventually materialize. Moreover, regarded as a condition 
number p(H A) for a matrix A preconditioned by an approximate inverse 
H, the antieigenvalue u may be expected to play a role similar to that 
of so-called log norms already in use for large sparse matrices A in the 
numerical analysis community and in optimization theory. 

There is a useful way to conclude this perspective of reaching beyond 
spectraloid, especially for finite matrices A. All such matrices may be 
thought of in terms of their Jordan canonical form A = D+U if we imagine 
that we already know the appropriate bases for such representations. The 
superdiagonal portion U is just composed of partial shifts such as we have 
referred to in this section. Thus, the beyond spectraloid operators are not 
so far away as they might at first seem. 


Notes and References for Section 6.6 


An exploration of how the antieigenvalue u plays the role of condition 
number in numerical linear algebra may be found in 

K. Gustafson (1996). “Trigonometric Interpretation of Iterative Methods,” 
Proc. Conf. on Algebraic Multilevel Iteration Methods with Applications, 
(O. Axelsson, ed.), June 13-15, Nijmegen, Netherlands, 23-29. 


Endnotes for Chapter 6 


A useful device for understanding the detailed spectral and related nu- 
merical range properties of specific operator classes is the state diagram. 
Because good expositions of the state diagram method already exist in the 
literature, we have not included a discussion of it here. For further recent 
information, see 

K. Gustafson (1996). “Operator Spectral States,” Computers Math. Ap- 
plic. to appear. 

K. Gustafson and D. Rao (1996). “Spectral States of Normal-like Opera- 
tors,” to appear. 

An underlying theme of Chapter 6 has been the appearance and use of the 
resolvent operator (T — zI)~! and estimates for it. Most of these estimates 
are really still rather crude: just a distance to the spectrum o(T), or if 
that won’t suffice, to the larger set W(T). Such estimates can sometimes 
be improved by use of the right variation of the numerical range or can 
motivate new variations of the numerical range such as the M-numerical 
range described in Chapter 5. 

We would like to elaborate this theme by advancing a further perspective 
on resolvent estimates that may be useful in future developments related 
to numerical range theory and application. What is a resolvent operator? 
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In applications to physics (the infinite-dimensional, or operator theoretic 
case), it often is an integral operator with a Green’s function kernel G(x, y). 
Its numerical range W((T — zI)~*) = ((T—zI)~'f, f)/(f, f) thus becomes 
a double integral over a kernel, which weights the function f(z) f(y). If 
we happened to know the full eigenfunction expansionals for the physical 
problem, then we can express everything (e.g., the Green’s kernel G, the 
resolvent series (T — zI)~', the Rayleigh quotients W((T — zI)~')), in 
terms of them. It is exactly in going beyond normal operators, and more 
generally beyond spectraloid operators, that relying just on eigenfunctional 
and spectral information no longer suffices. 

The same perspective about resolvents applies when T is a matrix A. 
Resolvent estimates (A—zJI)~! may be thought of in terms of the numerical 
range W((A—zJ)~'), that may be thought of as a double sum over a kernel 
which implements a joint weighting of products z;y;. Perhaps numerical 
range theory could better take into account this point of view in providing 
better resolvent estimates for use in applications. 

Finally, we would like to echo the suggestion given in the Endnotes to 
Chapter 5. Future numerical range research should make a strong commit- 
ment to a better understanding of the many new and important operator 
classes coming out of computational linear algebra and applications. Al- 
most all of these are beyond spectraloid. But each class carries particular 
structure properties reflecting those of the class of applications and those 
of the discretization methods employed for simulation. In this way, the 
numerical range will remain a vital and growing part of operator theory 
and matrix analysis. 
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